Team Leader - Nutanix Technology Champion - Nutanix NTC Storyteller

Julien DUMUR
Infrastructure in a Nutshell
nutanix centreon supervision snmp

We’ve all been there. That moment when your monitoring dashboard shows a beautiful green circle for your Nutanix cluster, while in reality, one of the nodes is struggling. That’s exactly what happened to me recently.

When I integrated Nutanix into my infrastructure, my first instinct was to pull out Centreon. Why? Because it’s my Swiss Army knife for monitoring. But I quickly realized that the “standard” method of adding a cluster locks us into an illusion of security. We see the “whole,” but we miss the “detail.”

In this feedback report, I’ll share my experience with Nutanix Centreon monitoring and explain why you should stop monitoring your cluster solely through its Virtual IP (VIP) and switch to a granular node-by-node strategy.

Why the “default” configuration left me wanting more

When installing the Nutanix Plugin Pack on Centreon, the documentation naturally guides you toward adding a single host representing the cluster.

How the standard Nutanix Plugin Pack works

The classic method involves querying the cluster’s Virtual IP (VIP) or the IP of one of the CVMs (Controller VM). It’s simple and fast: you enter the SNMP community, apply the template, and the services appear. You then monitor global CPU usage, average storage latency, and the general status reported by Prism.

The “Black Box” problem

This is where the trouble starts. By querying only the VIP, you are actually querying an SNMP agent that aggregates data. If you have a 3-node cluster, the monitoring will tell you that the cluster-wide memory is “OK.” But what about the memory load on node #3?

This is what I call the “black box” effect. Nutanix’s Shared Nothing architecture is a strength for resilience, but it can become a blind spot for monitoring if you don’t drill down to the physical layer. For an expert, knowing the cluster is “Up” is not enough; we need to know which specific physical component requires intervention before redundancy is compromised.

Decoupling monitoring for granular visibility

To break out of this deadlock, I changed my approach: treating each node as its own entity in Centreon. Here’s how I did it.

Step 1: Setting the stage on Prism Element

Before touching Centreon, you must ensure Nutanix is ready to talk. Head to Prism Element, in the SNMP settings. Here, I configured SNMP v2c access (or v3 if you want to max out security).

Check out my dedicated articles if you need details on how to configure SNMP v2c or SNMP v3 on your Nutanix cluster.

Step 2: The “Node by Node” addition strategy in Centreon

This is where the magic happens. Instead of creating a single “Cluster-Nutanix” host, I created as many hosts as I have physical nodes (e.g., cluster-2170_n1, cluster-2170_n2, etc.).

Host Configuration: Each host points to the cluster’s VIP IP address or the specific node’s CVM IP. By default, this will pull the same global information, but stay tuned.

Applying Templates: I apply the Virt-Nutanix-Hypervisor-Snmp-Custom template.

Surgical Filtering: This is the key secret. In the “Host check options,” I apply the custom macro FILTERNAME. This allows me to specify the exact name of the host to monitor. The plugin then filters the SNMP data sent by the VIP to return only what concerns my specific node.

Step 3: The trick to maintaining Cluster consistency

To keep an overview, I use Host Groups in Centreon. I created a group named HG-Cluster-Nutanix-Prod containing my 3 nodes. This allows me to create aggregated dashboards while keeping the “drill-down” capability (clicking to see details) for each physical machine.

Immediate benefits: Dashboarding and Peace of Mind

Since I switched to this configuration, my daily life as a sysadmin has radically changed:

Granular performance analysis: I can now identify a node consuming abnormally more RAM or CPU than its neighbors. It’s the perfect tool for detecting a “hot point” or a VM distribution issue.

Increased responsiveness: When something goes wrong, Centreon sends me an alert with the specific node name (n1, n2, etc.). No more guessing games in Prism Element to find out where to focus my search.

Clean history: I have metric graphs per physical server, which greatly facilitates Capacity Planning and troubleshooting.

Conclusion

If you manage Nutanix, don’t settle for the superficial view offered by the VIP IP alone. By taking 10 minutes to declare your hosts individually in Centreon with the FILTERNAME macro, you move from “passive” monitoring to a true control tower.

My verdict is clear: node-level monitoring is the only way to guarantee true high availability and sleep soundly at night.

0 comments

Leave a Reply