Team Leader - Nutanix Technology Champion - Nutanix NTC Storyteller

Julien DUMUR
Infrastructure in a Nutshell
openclaw on nutanix ahv

If you read my previous article detailing the architecture and the technical stack I chose to deploy OpenClaw, you already know why I decided to run this solution on my Nutanix AHV cluster. Today, we’re getting practical! I will show you, step by step, how to deploy your own instance on a freshly installed Ubuntu virtual machine.

Before kicking off the hostilities, here is a quick reminder of my setup. I provisioned a VM on Nutanix AHV with:

  • 8 vCPUs
  • 32 GB of RAM
  • 250 GB of storage
  • an NVIDIA Tesla P4 graphics card in PCI Passthrough

💡 Why favor full Passthrough over vGPU (virtual GPU)? Quite simply to guarantee near “bare-metal” inference performance. By giving our VM direct and exclusive access to the physical hardware, we completely eliminate the overhead (latency) associated with the virtualization layer.

Let’s start the deployment.

Preparing the Ubuntu VM: System and NVIDIA Drivers

The very first step is to prepare the ground to deploy our AI.

Ubuntu 24.04: Operating System Update

This is a rule I apply every single time I deploy a new operating system. As soon as I connect via SSH, I make sure all packages are up to date to avoid future security flaws or dependency conflicts.

sudo apt update && sudo apt upgrade -y

GPU: Installing NVIDIA Drivers

For OpenClaw to harness the computing power of my Tesla P4, the operating system must be able to communicate with it properly. Here are the commands to run to install the drivers (you can access a more detailed guide on the blog):

sudo apt install nvidia-driver-535-server -y
sudo reboot

Once the machine has rebooted, we log back in and type the command to verify that our GPU is properly detected and ready to work:

nvidia-smi

Node.js and OpenClaw

Installing Node.js 22

OpenClaw is built on Node.js. To ensure we have a recent and efficient runtime environment (here version 22), we add the official NodeSource repository before launching the installation:

curl -fsSL [https://deb.nodesource.com/setup_22.x](https://deb.nodesource.com/setup_22.x) | sudo -E bash -
sudo apt install -y nodejs

Basic OpenClaw Deployment

Now that NodeJS is in place, we move on to installing OpenClaw. A simple curl script provided by the developers does the heavy lifting:

curl -fsSL [https://openclaw.ai/install.sh](https://openclaw.ai/install.sh) | bash

Once the installation is complete, the system automatically launches the configuration wizard for your instance. I will detail this step as well as the creation of API keys (Discord, Telegram, etc.) in a future blog post.

A small manipulation is required right after the OpenClaw installation if we want to be able to use “openclaw” commands without constraints. We need to add the local installation directory to our PATH environment variable (remember to adapt the username if you are not using administrateur):

export PATH="/home/administrateur/.npm-global/bin:$PATH"

💡 Why this manipulation? It’s an excellent security practice that I highly recommend. By exporting the PATH to ~/.npm-global/bin, we avoid installing global NPM packages with root (sudo) privileges. This significantly reduces attack surfaces and saves you from the eternal Linux permission conflicts!

Cleanly Exposing OpenClaw with Caddy

By default, the OpenClaw web interface listens on port 18789. Instead of attacking this port directly, I always prefer to place a reverse proxy in front of my applications. For this lab, my choice fell on Caddy.

sudo apt install -y caddy

💡 Why Caddy rather than Apache or Nginx? Because Caddy is formidably efficient. Where Nginx sometimes requires long configuration blocks for simple proxying, Caddy does the same job in literally three lines of code, all while being ultra-lightweight.

We edit its configuration file:

sudo vi /etc/caddy/Caddyfile

And we replace the entire content with the following instructions (replace the IP with the one of your VM, in my case 192.168.84.134):

192.168.84.134 {
    reverse_proxy 127.0.0.1:18789
}

Now, all that’s left is to restart the service so the proxy takes over:

sudo systemctl restart caddy

Network Security: Locking Down the OpenClaw Instance

Having a functional instance is good, securing it is essential. Even if you are on your local network (LAN), you should never leave open access to your control interface. We are going to apply a strict configuration via the OpenClaw CLI commands.

We start by restricting the Gateway listening to the local loopback to prevent any direct access:

openclaw config set gateway.bind loopback

We then force the operating mode to local, and activate token authentication (the bare minimum):

openclaw config set gateway.mode local
openclaw config set gateway.auth.mode token

Finally, since we are going through Caddy, we must authorize Cross-Origin requests (CORS) coming from our IP address, otherwise the browser will block the page (don’t forget to adapt the IP):

openclaw config set gateway.controlUi.allowedOrigins '["[https://192.168.84.134](https://192.168.84.134)"]'

We restart the service to apply our lockdown:

openclaw gateway restart

💡 The security pattern applied here is akin to local “Zero Trust”. By forcing OpenClaw on the loopback (127.0.0.1), we ensure that absolutely all traffic is forced to go through our Caddy proxy. Coupled with CORS filtering and authentication, we provide a baseline protection for our instance against potential scans or malicious scripts on the network.

First Contact and Configuration Validation

Retrieving the Access Token

Now that the doors are locked, we need the key. The authentication token was automatically generated during installation. We’re going to go fish it directly out of the JSON configuration file:

grep -i token ~/.openclaw/openclaw.json

Carefully copy this string of characters. Then open your browser and access your Web interface (e.g., https://192.168.84.134).

Enter the token in the “Gateway Token” box.

Device Approval

Once connected, you will notice that something is missing: the system is waiting for us to approve the “device” (the PC or tablet from which we wish to use OpenClaw) to grant it the right to process requests.

Return to your terminal to list the pending devices:

openclaw devices list

Locate your device ID in the list (a UUID-type string) and approve it:

openclaw devices approve b7beb7fa-fa4e-46e9-aec1-282bcce881f6

💡 Device approval (devices approve) is much more than a simple interface formality. It’s a sort of cryptographic handshake. This mechanism guarantees that no unsolicited machine can attach itself to your OpenClaw cluster instance without your knowledge!

Interaction Tests

The OpenClaw instance is now 100% operational! To validate our entire stack, there’s nothing like a full-scale test. You can send a first prompt on the web interface’s integrated chat, or configure a bridge to send a message on the Discord side.

Conclusion

We went from a simple Ubuntu VM to a true secured inference server, powered by Node.js and accelerated by a dedicated NVIDIA Tesla P4 GPU via Nutanix AHV. The architecture is clean, secured behind a Caddy proxy, and ready to handle our requests.

But this is only the beginning. In upcoming articles, we will go even further: I will show you how to configure OpenClaw via the startup wizard, deploy local models via Ollama, create an interactive Discord bot, and even inject Google API keys to equip our AI with search capabilities. Stay tuned!

Read More

If you follow my ramblings on the blog, you know I love tinkering with my clusters and testing somewhat out-of-the-box stuff (cf. my Steamdeck articles for example). Recently, I had a thought: Gemini or Claude in the public cloud is great for coding a Python script or writing emails. But when it comes to asking it to interact with our local infrastructure, that’s where it gets stuck.

So I wondered how I could connect artificial intelligence closer to my VMs. With this in mind, I got my hands on OpenClaw. Honestly, it was a bit of an obstacle course at the start. No more simple conversational gadgets, here we are talking about deploying a true Private AI on a Nutanix AHV cluster capable of acting on our infrastructure. Let me present the tech stack I chose for this experiment.

What is OpenClaw?

For those who have been living in a cave these past few months, OpenClaw is a GitHub project that exceeded 300k stars in just a few months. Imagine an ultra-intelligent thought translator coupled with a butler. Instead of clicking through dozens of menus in a complex interface, you simply ask your infrastructure to work for you in natural language (via a universal web interface or even messaging apps like WhatsApp and Telegram). It is even capable of working on its own while you sleep!

But where it gets exciting for us engineers is under the hood. OpenClaw is not just another “stateless” Large Language Model (LLM) that forgets everything with each new request. It is a true Agentic Gateway. Concretely, this means it orchestrates autonomous agents equipped with tools. These agents can be configured to tap directly into our cluster’s private APIs (like the REST APIs of Prism Element or Prism Central), code, browse the web, and synthesize certain information. In short, we don’t just ask the AI questions anymore, we delegate tasks to it.

Why Self-hosted?

In the field, the question of data governance arises the second the word “AI” is pronounced. Out of the question to send sensitive information to servers over which I have no control!

Choosing the Self-hosted route with OpenClaw means taking back absolute control. Data flows, execution logs, and API credentials stay locked down warm and safe on my network, isolated from the internet if desired.

Architecture and Tech Stack

For this project, a simple “Next, Next, Finish” on the corner of a table was out of the question. Here is the robust technical architecture I ended up validating for my deployment.

The Foundation: Nutanix AHV & Ubuntu 24.04 LTS

To run this beast, you need solid foundations. I provisioned a virtual machine running Ubuntu 24.04 LTS hosted directly on my Nutanix AHV cluster.

On the sizing side, I went with 8 vCPUs, 32 GB of RAM, and 250 GB of dedicated storage. You might tell me: “32 GB for a gateway, isn’t that a bit too much?” The gateway will have to ingest substantial data streams, maintain the cache of the various active agents, and potentially handle heavy parallel API querying. And besides, I can allocate these resources in my lab, so why deprive myself?

The Application Engine: NodeJS 22

At the heart of OpenClaw, the magic happens thanks to NodeJS 22. It is the execution engine that runs the gateway and its AI agent integrations.

Why is Node 22 an excellent architectural choice here? For its asynchronous management (Event Loop). When you ask OpenClaw to do a status report on 50 VMs, the gateway will initiate multiple API calls to Prism Central while keeping your WebSocket stream open to reply in real-time in the chat interface. NodeJS excels in this non-blocking concurrency management.

Network Routing: Caddy

The usual operating mode for OpenClaw is to deploy it locally on the machine from which you will connect to it, or to set up a tunnel to access the remote instance. Let’s not lie to ourselves, I wanted to type the IP in my browser and be able to access my instance, whether I’m on my PC or my tablet.

To make this possible, I use a Caddy Reverse Proxy. Caddy manages traffic routing and HTTPS encryption fully automatically.

I can already hear you saying: “Yes, but if a guy connects to your local network, he will have access to your instance!”. Well no! Because OpenClaw natively integrates a Device Whitelisting system. If your PC has never been connected to the instance, you will have to provide the “Gateway Token”. Then, you will have to accept this new connection on the OpenClaw instance side. As you can see, only previously authorized devices can enjoy your local instance.

The Entry Point: Discord

The choice of entry point, which will allow you to interact with OpenClaw, is often a matter of taste and colors.

OpenClaw directly integrates a chat system so you can talk to it. It’s good, it’s native, but inaccessible if I’m not at home. The system also offers to configure external entry points like Telegram, WhatsApp, Discord, or even Teams and Slack. And that is clearly a big plus because it gives almost unlimited possibilities!

What’s Next?

The goal of this article was to present the architecture envisioned for my OpenClaw assistant, to understand what we are deploying and why. We therefore have a coherent technical stack, performant thanks to Nutanix AHV, and hosted locally.

In a future article, I will explain how to install OpenClaw step by step until you have a functional instance.

Read More
nutanix ahv api

I’ll be honest: a while ago, development and APIs weren’t exactly my cup of tea. My playground was the console, SSH, commands typed on the fly. But with the future (and inevitable) blocking of SSH access on Prism Element and Prism Central, I had no choice: I had to get serious about it. And if I’m going to dive into the world of Nutanix APIs from my Windows PC, I might as well do it with the right tools to avoid tearing my hair out. In this article, I’ll show you how I equipped myself with the perfect tools to query Nutanix APIs.

Why Optimize Your Windows Environment for APIs?

For years, my reflex as a sysadmin facing a complex or repetitive task on Nutanix was the same: open PuTTY, connect via SSH to a Controller VM (CVM), and run ncli or acli commands on the fly. It was fast, it was efficient.

But I’ll be direct: that era is over. Nutanix is making a major security shift. SSH access to clusters will be disabled in one of the upcoming releases, relegated to the simple rank of emergency access for support. The only sustainable, supported, and scalable method to interact with your infrastructure is the API. Whether it’s the v2 API to drive a local cluster via Prism Element, or the v3 APIs on Prism Central, API automation has become the norm.

The problem? Windows hasn’t historically been the best student for handling complex web requests in the command line. PowerShell has made huge progress with Invoke-RestMethod, but when it comes to testing, debugging, and formatting nested JSON, nothing beats a solid Linux foundation coupled with a graphical API client.

That’s where our two best allies come in: WSL (Windows Subsystem for Linux) for the power of the native command line, and Postman for visual exploration of Nutanix APIs. Let’s see how to put all this together.

Solution 1: The System Foundation with WSL (Windows Subsystem for Linux)

How to avoid tearing your hair out with the quotes of a curl command under the Windows command prompt (cmd) or fighting with character escaping under PowerShell? The most elegant and robust solution today is to use the Windows Subsystem for Linux (WSL).

Deploying WSL and Ubuntu in Minutes

The installation is ultra-simple on recent versions of Windows 10 and 11. Open a PowerShell console as an administrator and simply type this magic command:

wsl --install ubuntu

Then restart your PC. You now have a functional Ubuntu distribution, fully integrated into your Windows, without the heaviness of a classic virtual machine. It’s the perfect environment to run your future Bash or Python scripts targeting the Nutanix infrastructure.

Look for the “Ubuntu” icon in the Windows start menu to launch the command prompt on the subsystem.

The Essential Packages: curl and jq

Once in your new Ubuntu terminal, you are missing two vital tools to dialogue with REST APIs: curl (the standard for forging web requests) and jq (the absolute Swiss Army knife for manipulating, filtering, and formatting JSON responses). Install them with these command lines:

sudo apt update && sudo apt upgrade
sudo apt install curl jq -y

Why is jq so critical in our line of work? Let me share a concrete field situation with you. JSON responses returned by Prism Element or Prism Central are often extremely verbose. If I simply want to retrieve the unique identifier (UUID) of my cluster via the v2 API to use it in a script, without drowning in hundreds of lines of configuration, here is the exact command I use:

curl -k -u admin:MyPassword -X GET https://<YOUR_PRISM_ELEMENT_IP>:9440/api/nutanix/v2.0/cluster | jq '.cluster_uuid'

The -k parameter is crucial here: it ignores the SSL certificate warning (which is self-signed by default on Nutanix), and the | jq '.cluster_uuid' instantly filters the raw response to return only the targeted information (in the example below: "00064a67-579d-c757-5883-002590b8ef5a"). It’s clean, neat, and perfectly integrable into a variable to automate a deployment workflow, for example.

Solution 2: The Must-Have Graphical API Client: Postman

The command line is great for running production scripts. But when it comes to exploring a new API, testing the parameters of a complex request, or analyzing the structure of a 500-line JSON payload, I prefer a graphical interface. And in this area, Postman is perfectly suited. You can download and install it in seconds from their official website.

Configuring Your First Workspace

The first mistake I made when starting with the Nutanix API (and with Postman in general), is hardcoding my IP addresses, usernames, and passwords in every request. Never do that! Not only is it tedious if you switch clusters, but it’s especially a major risk of information leakage if you share your screen or your collections.

Postman offers a vital feature: Environments. Create a new environment (e.g., “Prod Cluster”) and define three variables in it:

  • cluster_ip: The IP address of your Prism Element or Prism Central.
  • username: Your service account (avoid using the default admin account if possible).
  • password: The associated password (to be configured as “Secret” type to hide it).

From now on, in your requests, you will no longer use the raw URL, but the call to variables between double curly braces: https://{{cluster_ip}}:9440/api/nutanix/v2.0/...

The Expert’s Trick: Importing the Prism Central Swagger

Here is my real “trick” to save you hours. Nutanix APIs, particularly the v3 APIs on Prism Central, are extremely vast. Rather than creating your GET, POST, or PUT requests one by one by laboriously reading the documentation on the Nutanix.dev portal, did you know that you could suck all the configuration directly from your own cluster?

The Prism Central API exposes its OpenAPI specification (Swagger). In Postman, click on the “File > Import” button in the top left menu, choose “Link”, and simply paste this URL (replacing the IP with your Prism Central’s IP): https://<PRISM_CENTRAL_IP>:9440/static/v3/swagger.json

Let the magic happen: Postman will query your cluster and automatically generate a complete Collection containing absolutely all possible v3 API requests, preformatted with the right headers and sample payloads. It’s a massive time saver for exploration!

Test: The First API Call from Postman

Now that the tooling is ready and my variables are configured, it’s time to make the first graphical request to the cluster to retrieve its global information.

Managing Authentication and Bypassing the SSL Trap

Create a new request in Postman (+ button or New > HTTP Request). Select the GET method and enter the following URL using our variable: https://{{cluster_ip}}:9440/api/nutanix/v2.0/cluster

Before clicking “Send”, we have two settings left to make:

  1. Authentication: Go to the Authorization tab, choose the Basic Auth type. In the Username and Password fields, type {{username}} and {{password}} respectively. Postman will replace these values on the fly.
  2. The SSL Certificate: By default, Nutanix uses self-signed certificates. If you run the request now, Postman will block the call with a security error. Go to File > Settings (or the gear icon), General tab, and disable the “SSL certificate verification” option. This is the graphical equivalent of our -k parameter in curl.

Click Send. If everything is green (Status 200 OK), you should see a beautiful formatted JSON appear at the bottom, containing your cluster’s UUID, its name, its AOS version, and its virtual addresses. Congratulations, your workstation is communicating with Nutanix!

Basic Auth vs JSESSIONID

If you are just starting out, the Basic Auth method (which sends your credentials with each request) is perfect. But beware: this method has an impact.

Why? Because every time you make an API call in Basic Auth, the CVM’s Acropolis service must validate your credentials with the authentication module (and often, these credentials will be linked to an Active Directory via LDAP). If you run a script that makes 500 requests in a row to inventory VMs, you will trigger 500 identity validations. This unnecessarily saturates the CVMs and your domain controllers.

The best practice if you script massively: Authenticate only once, and use session Cookies! When you make an initial authentication request or query the API, Nutanix sends you back a cookie named JSESSIONID. Postman stores it automatically and uses it for subsequent requests in your collection. In your future Bash/Python scripts, always remember to retrieve this cookie during the first call, and pass it in the Headers of your subsequent calls. You will drastically relieve the management plane of your cluster!

Conclusion: Security Reminder and Advanced Usage

All the tools are now in place to free myself as much as possible from SSH during my next troubleshooting sessions.

I must give a fundamental security reminder. Postman allows you to export your collections to share them with your colleagues or back them up. It’s great for teamwork. But beware: if you haven’t used environment variables as we saw earlier, and you typed your passwords “hardcoded” directly in the Authorization tab of your requests, they will be exported in clear text in the collection’s JSON file.

Always ensure your “Secrets” remain in your local Environment configuration (which, by default, does not export current values with the collection). I’ve seen too many admin passwords lying around on the network because of this!

Now, all I have to do is look into the scripting and automation part to be able to develop applications that will help me drive, audit, and configure my Nutanix clusters.

But that will be the subject of a future article! Until then, happy querying to all.

Read More
GPU Nutanix AHV Linux

Integrating graphics processing power within virtualized environments has become a must. Whether it’s to run Artificial Intelligence models, Machine Learning, or simply for intensive video processing, our virtual machines increasingly need muscle.

When I talk with clients, I often get questions about this: how do you assign a physical graphics card to a VM in a simple and performant way?

Today, I suggest we look together at how to deploy an NVIDIA Tesla P4 GPU on an Ubuntu Server 24.04 VM hosted on Nutanix AHV, using “Passthrough” mode.

1. Prerequisites

Before getting our hands dirty, let’s take a moment to check our equipment. Good preparation is half the work done! I myself have lost hours in the past due to a simple forgotten prerequisite.

To follow this tutorial, you will need:

  • A Nutanix node (physical cluster) equipped with at least one NVIDIA Tesla P4 card.
  • A virtual machine running Ubuntu Server 24.04.
  • Functional SSH access to this VM with sudo privileges.

Although Nutanix AHV handles this transparently for you, keep in mind that Passthrough mode relies on specific hardware instructions. It requires that I/O virtualization extensions (VT-d for Intel or AMD-Vi for AMD) are properly enabled in the BIOS of your physical node. If you ever build a “home-lab” cluster, this is the first thing to check!

2. Nutanix Configuration: Passthrough Mode

Now that our foundation is solid, let’s move on to the administration interface. This is where the magic happens. Whether you use Prism Element or Prism Central, the logic remains the same. First, make sure your virtual machine is powered off.

Go to your VM’s settings, select “Update”, and scroll down to the “GPUs” section. Click on “Add GPU”.

In the window that opens, the choice is crucial: in the “GPU Type” drop-down menu, select Passthrough mode, then choose your Tesla P4 from the list.

Passthrough mode is a special feature: it allows you to “hand over the keys” of the physical graphics card directly to the virtual machine. The guest OS has the illusion (and the benefits) of physically owning the card.

You might be wondering why we prefer Passthrough over vGPU? It’s a matter of use case, but also architecture.

vGPU allows you to virtually slice a card to share it among several VMs, which is great for VDI, but it requires the installation and maintenance of an NVIDIA license server (vGPU Software).

Passthrough, on the other hand, dedicates 100% of the Tesla P4’s power to our Ubuntu VM, without any additional license server. For a raw single-VM performance need, it’s clearly the best option.

3. Preparation and Ubuntu 24.04 Update

Once the GPU is attached, save the configuration, power on your VM, and connect via SSH.

Before we rush into installing the NVIDIA drivers, there’s one step I absolutely never skip: updating the system.

Simply type this command:

sudo apt update && sudo apt upgrade -y

Proprietary NVIDIA drivers rely on the DKMS (Dynamic Kernel Module Support) system to compile kernel modules on the fly during installation.

If your kernel headers are not perfectly synchronized with your current Linux kernel version, the installation will fail silently. A freshly updated system is your best guarantee for a clean, hitch-free compilation!

4. NVIDIA Drivers Installation (Server Branch)

Now that our system is clean and updated, let’s move on to the main course. On Ubuntu, we often have the reflex to use the ubuntu-drivers autoinstall command or install the latest trendy “desktop” version. Let me stop you right there!

For a server, especially in production, stability is key. That’s why we are going to install the “Server” branch of the driver. Type the following command:

sudo apt install nvidia-driver-535-server -y

💡 The Expert’s Tip: Why specifically the “server” package? NVIDIA maintains specific driver branches for data centers. The “Server” branch (or Tesla driver) is designed for long lifecycles (LTS) and minimizes the risk of regression. Installing a “Desktop” driver on a hypervisor or an AI server means risking a minor update breaking your production environment on a Friday at 5 PM!

Let the installation finish (this may take a few minutes while DKMS compiles the module for your kernel). Once completed, a reboot is mandatory to properly load the driver:

sudo reboot

5. Validation

After the reboot, reconnect via SSH to your virtual machine. To verify that the OS, the driver, and the hardware are communicating perfectly, NVIDIA provides us with an essential command-line tool:

nvidia-smi

If all went well, you should see a beautiful ASCII dashboard appear.

💡 Explications: At the top right, note the CUDA Version (here 12.2): this is the maximum version of the CUDA API supported by this driver, crucial info for AI developers. Also look at the Perf column: it indicates “P0”, which corresponds to the maximum performance state (the “P-states” range from P0 to P12 for power saving). If your card stays stuck on a low P-state while under load, there’s a hardware or thermal issue! Finally, this output confirms that the OS is ready to host the NVIDIA Container Toolkit (for Docker with GPU).

Conclusion

And there you have it! In a few simple steps, we successfully physically presented our NVIDIA Tesla P4 GPU to our Ubuntu 24.04 VM under Nutanix AHV. Passthrough mode allowed us to achieve a high-performance, zero-latency configuration without the overhead of an external license server.

Our Tesla P4 is now properly installed! The next logical step? Deploying LLMs (Large Language Models) or compute-heavy applications in isolated containers. But that will be for a future article on the blog!

Read More

I will never forget the day the reality of hyperconvergence hit me. We were in the middle of an infrastructure migration. On one side, we had two full 42U racks from the 3-tier era, packed with servers and storage arrays. On the other side, to replace them, we only needed… 6U.

Two 2U Nutanix blocks (with 4 nodes in each block) and two Top of Rack switches. That was it. 84 rack units reduced to 6. The contrast was so violent it almost felt suspicious. How could such a small physical footprint replace our historic cabinets?

But make no mistake. Beneath this apparent simplicity lay a major technological rupture. We had moved from a “Hardware-Defined” era, where intelligence resided in expensive proprietary ASICs, to a “Software-Defined” era.

This void in the racks wasn’t just aesthetic. It told another story: one of exploding density, radically changing the economic equation of the datacenter. Less cooling, less floor space, less power consumption for tenfold computing power. The storage array hadn’t disappeared: it had been absorbed and virtualized by software.

The Legacy of Web Giants

To understand where this magic comes from, we have to go back to the early 2000s, far from air-conditioned enterprise server rooms, into the labs of Google and Amazon.

At that time, these giants were hitting a wall: the 3-tier model didn’t scale. To index the entire web, using traditional storage arrays like EMC or NetApp would have cost an astronomical amount. They had to find another way.

Their stroke of genius was to flip the table. Instead of buying “Premium” hardware designed never to fail (and sold at a gold price), they decided to use “Commodity Hardware”. Standard x86 servers, cheap, almost disposable.

The philosophy changed completely: hardware will fail. It is a statistical certainty. Rather than fighting this reality with redundant components, they decided to manage failure at the software level.

For purists and tech historians, the founding moment is captured in a PDF document published in October 2003: The Google File System (SOSP’03). This research paper is the bible of modern infrastructure. It describes a system where thousands of unreliable hard drives are aggregated by intelligent software that ensures resilience. If a drive dies? The system doesn’t care. No need to rush to replace the disk at 3 AM. The software has already replicated the data elsewhere.

Hyperconvergence is simply the arrival of this “Web Scale” technology, packaged and democratized for our enterprises.

Anatomy of an HCI Node: How Does It Work?

Concretely, what changes at the hardware level? In a hyperconverged infrastructure, we no longer separate Compute and Storage. Everything is reunited in the same chassis, called a “Node”.

Each node contains its own processors, RAM, and its own disks (SSD, NVMe, HDD). But unlike a classic server, these disks aren’t just for installing the local OS. They are aggregated with the disks of other nodes in the cluster to form a global storage pool.

This is where the real revolution comes in: the CVM (Controller VM).

Imagine taking the physical controllers of your old SAN array (the compute part) and turning them into software. On each physical server in the cluster, a special virtual machine (the CVM) runs permanently. It is the conductor.

For the technical expert, the feat lies in hardware management. The hypervisor (ESXi or AHV) does not manage the storage disks. Thanks to a technology called PCI Passthrough (or I/O Passthrough), the CVM bypasses the hypervisor to speak directly to the disks. Result: raw performance without the classic virtualization overhead.

The Strengths of Hyperconvergence

Beyond the hype, three technical arguments have hit the mark in enterprises.

1. Scale-Out (The LEGO Approach)

Gone is the headache of 5-year sizing. With 3-Tier, when the array was full, it was panic mode (Scale-Up). With HCI, if you need more resources, you buy a new node and plug it in. The cluster automatically absorbs the new CPU power and storage capacity. It is linear and predictable growth.

2. Data Locality

This is the Holy Grail of performance. In a classic architecture, data had to cross the SAN network to reach the processor. With HCI, software intelligence ensures that data used by a VM is (whenever possible) stored on the disks of the physical server where it is running. The path is near-instantaneous. The network is no longer a bottleneck.

3. Distributed Rebuild (Many-to-Many)

This is often the argument that finally convinces administrators traumatized by RAID rebuilds. On a classic array (RAID 5 or 6), if a 4TB drive breaks, a single “hot spare” drive has to rewrite everything. This can take days, during which performance collapses. In HCI, data is replicated in chunks all over the cluster. If a drive dies, all other disks in all other nodes participate simultaneously in reconstructing the missing data. We move from a “1 to 1” problem to a “Many to Many” solution. Result: resilience is restored in minutes.

The Weaknesses: What Marketing Forgets to Mention

If hyperconvergence seems magical, it is not without flaws. As an expert, it is crucial to understand the trade-offs of this architecture.

The first is the “CVM Tax”. Intelligence isn’t free. Since the storage controller is now software, it consumes CPU and RAM resources that are no longer available for your applications. On very small clusters, reserving 20GB or 24GB of RAM per node just to “run the shop” can seem heavy, even if it is the price of peace of mind.

The second technical limitation is the critical dependence on “East-West” network traffic. In a 3-Tier array, replication traffic remained confined within the array. In HCI, to secure data (RF2 or RF3), the CVM must write it locally but also immediately send it over the network to another node. If your 10/25 GbE network is unstable or poorly configured, the entire performance and stability of the cluster collapses. The network is no longer a simple commodity; it is the nervous system of your cluster. I repeat it to every client: an HCI cluster is 80% network. If your network has a problem, your HCI cluster has a problem.

Nutanix, The Pioneer

Hyperconvergence marked the end of an era. It proved that software could supplant specialized hardware, transforming our rigid datacenters into agile private clouds.

But an idea, however brilliant (like the Google File System), is useless if it remains confined to a research lab. Someone had to take these complex concepts and make them accessible to any system administrator in less than an hour.

That is where Nutanix comes in.

Founded by former Google employees who had worked on GFS, this company created NDFS (Nutanix Distributed File System). They succeeded in the crazy bet of running a “Google-type” infrastructure on standard Dell, HP, or Lenovo servers.

How did Nutanix manage to become the undisputed leader of this market, surviving even the assault of VMware with vSAN? That is what we will dissect in the next article of this series.

Read More

Let’s be honest: shutting down a complete Nutanix cluster is always a bit stressful. Even after 15 years in the business. Why? Because even with the best HCI technology on the market, cutting the power on an IT infrastructure is never trivial.

I’ve seen too many “cowboys” pull the plug or perform a brutal “Shutdown” via IPMI, thinking data resiliency would handle the rest. Spoiler alert: this often ends with Level 3 Nutanix support on the line to recover corrupt Cassandra metadata or with the loss of one or more disks.

This guide is my lifeline to ensure my cluster restarts without issues. No GUI, no Prism Element for the critical steps. We open the terminal, connect via SSH, and do it properly.

Phase 1: Health Checks

Before even thinking about stopping a single VM, you must ensure the cluster is capable of stopping (and more importantly, restarting). If your cluster is already suffering, shutting it down is not always a good option.

1.1 SSH Connection to the CVM

Open your favorite terminal (PuTTY works just fine) and connect via SSH to the cluster’s virtual IP address (Cluster VIP) with the user nutanix.

1.2 Nutanix Cluster Checks (NCC)

To ensure the cluster is healthy, it is necessary to run an NCC. Run a full check:

ncc health_checks run_all

My advice: Don’t just skim through the report. If you have a “FAIL” on Cassandra, Zookeeper, or Metadata, STOP. Fix it before shutting down. A warning about a full disk or an old NTP alert is acceptable. But data integrity is non-negotiable.

1.3 Resiliency Verification

The Prism dashboard is pretty; it tells you “Data Resiliency Status: OK”. That’s good, but it’s not precise enough for a total shutdown. I want to know if my data is truly synchronized, right now.

Type this command and look it in the eye:

ncli cluster get-domain-fault-tolerance-status type=node

What you need to see: A line indicating Current Fault Tolerance: 2 (or 1 depending on your RF configuration).

If you see a state indicating a rebuild in progress, do not shut down the cluster and wait for the rebuild to finish.

Phase 2: Shutting Down Workloads

Once the cluster is validated as healthy, we move on to the virtual machines. The classic mistake is rushing to stop the nodes, but this will be refused if virtual machines are still running on the cluster.

2.1 The Battle Order

Start by shutting down your test/dev environments, then application servers, and finally databases. It’s common sense, but it’s always good to be reminded.

Once all production machines are off, you can now shut down the remaining “tooling” VMs of your infrastructure: AD, DNS, firewalls…

2.2 Managing Prism Central

Connect to Prism Central via SSH with the nutanix account, then run the stop command:

cluster stop

Wait for the PCVM services to stop and verify that the cluster is indeed stopped:

cluster status

If all services are stopped and the cluster status is “stop”, we can now proceed to shut down the PCVM:

sudo shutdown -h now

Phase 3: Stopping Nutanix Services (“Cluster Stop”)

Your VMs and Prism Central are off. Your hosts are running nothing but CVMs (Controller VMs). This is the critical moment. We never perform an OS shutdown of the CVMs without first stopping the cluster services properly.

Why? Because a brutal shutdown of CVMs can lead to data corruption or metadata inconsistencies that might require support intervention.

3.1 Stopping the Cluster

Reconnect to your Nutanix cluster VIP and simply type:

cluster stop

The system will ask for confirmation before launching operations. Type Y.

This command orders each CVM to stop its services in a precise order. The Stargate service (which handles storage I/O) ensures everything is “flushed” to disk before shutting down.

You will see lines scrolling by indicating the stop of Zeus, Scavenger, Cassandra, etc. Be patient. Depending on the cluster size, this can take 2 to 5 minutes.

3.2 Verification

Once the operation is complete, check the actual state of services:

cluster status

What you need to see: A list of services for each CVM. They must all be in the DOWN state, with the potential exception of the Genesis service which may remain UP; this is normal.

If you see other services still UP, wait a minute and run the check again. Do not proceed until the cluster is logically fully stopped.

Phase 4: Shutting Down CVMs and Physical Nodes

We are at the end of the tunnel. The cluster is logically stopped. Only empty shells remain: the CVMs (which are Linux VMs, let’s not forget) and the hypervisors.

4.1 Stopping CVMs

You must now connect to each CVM individually (via its IP, no longer via the VIP) and run the shutdown command.

The official command:

cvm_shutdown -P now

The cvm_shutdown command contains specific hooks to notify the hypervisor. Repeat the operation on each node of the cluster.

4.2 Stopping Hypervisors

Once the CVMs are off, connect to your hosts (via SSH or IPMI) and on each of them type the following command:

shutdown -h now

The Expert Nugget: The Automation Script ⚡

Do you have a 16-node cluster and don’t feel like connecting 32 times (16 CVM + 16 Hosts)? I get it.

Here is a script to run from any CVM in the cluster that will shut down all CVMs, then all AHV hosts.

⚠️ WARNING: This script asks no questions. Ensure you have validated Phase 3 (cluster stop) before launching this, otherwise, a crash is guaranteed.

The “Kill Switch” Script (For AHV)

From a CVM, this script retrieves the IPs of other CVMs and hosts, then sends the shutdown order in sequence.

for svmip in `svmips`; do ssh -q nutanix@$svmip "sudo /usr/sbin/shutdown +1 ; hostname"; done
for hostip in `hostips`; do ssh -q root@$hostip "/usr/sbin/shutdown +3 ; hostname"; done
  • The first command orders the shutdown of CVMs after a one-minute delay.
  • The second command orders the shutdown of nodes after a 3-minute delay.

Once you have launched the commands, you will lose connection after one minute. You can then monitor the shutdown of your nodes from their respective IPMI interfaces.

Phase 5: Powering Back Up (Cold Boot)

The maintenance period is over. What do we do? Press ON and pray? No, we follow the reverse order.

  1. Physical Network: Turn on your Top-of-Rack switches first. If the network isn’t there, the nodes won’t see each other upon booting.
  2. IPMI / Physical: Turn on the physical nodes.
  3. Patience: AHV will boot, then automatically start the CVM.
    • Tip: Don’t touch anything for 10 minutes. Let the CVMs form the cluster.
  4. Starting the Cluster: Connect via SSH to a CVM. Verify that all CVMs are up (svmips should list them all). Then:cluster start
  5. Verify that the cluster has started properly with the command:cluster status
  6. Starting Workloads: Once the cluster is UP, power on the PCVM first, then your VMs (Infra first, Appli second).

Conclusion

Shutting down a Nutanix cluster is a simple procedure but requires good sequencing. It’s not complicated, but it doesn’t forgive impatience. If you follow these steps, you’ll sleep soundly during the power outage.

Read More
nutanix centreon supervision snmp

We’ve all been there. That moment when your monitoring dashboard shows a beautiful green circle for your Nutanix cluster, while in reality, one of the nodes is struggling. That’s exactly what happened to me recently.

When I integrated Nutanix into my infrastructure, my first instinct was to pull out Centreon. Why? Because it’s my Swiss Army knife for monitoring. But I quickly realized that the “standard” method of adding a cluster locks us into an illusion of security. We see the “whole,” but we miss the “detail.”

In this feedback report, I’ll share my experience with Nutanix Centreon monitoring and explain why you should stop monitoring your cluster solely through its Virtual IP (VIP) and switch to a granular node-by-node strategy.

Why the “default” configuration left me wanting more

When installing the Nutanix Plugin Pack on Centreon, the documentation naturally guides you toward adding a single host representing the cluster.

How the standard Nutanix Plugin Pack works

The classic method involves querying the cluster’s Virtual IP (VIP) or the IP of one of the CVMs (Controller VM). It’s simple and fast: you enter the SNMP community, apply the template, and the services appear. You then monitor global CPU usage, average storage latency, and the general status reported by Prism.

The “Black Box” problem

This is where the trouble starts. By querying only the VIP, you are actually querying an SNMP agent that aggregates data. If you have a 3-node cluster, the monitoring will tell you that the cluster-wide memory is “OK.” But what about the memory load on node #3?

This is what I call the “black box” effect. Nutanix’s Shared Nothing architecture is a strength for resilience, but it can become a blind spot for monitoring if you don’t drill down to the physical layer. For an expert, knowing the cluster is “Up” is not enough; we need to know which specific physical component requires intervention before redundancy is compromised.

Decoupling monitoring for granular visibility

To break out of this deadlock, I changed my approach: treating each node as its own entity in Centreon. Here’s how I did it.

Step 1: Setting the stage on Prism Element

Before touching Centreon, you must ensure Nutanix is ready to talk. Head to Prism Element, in the SNMP settings. Here, I configured SNMP v2c access (or v3 if you want to max out security).

Check out my dedicated articles if you need details on how to configure SNMP v2c or SNMP v3 on your Nutanix cluster.

Step 2: The “Node by Node” addition strategy in Centreon

This is where the magic happens. Instead of creating a single “Cluster-Nutanix” host, I created as many hosts as I have physical nodes (e.g., cluster-2170_n1, cluster-2170_n2, etc.).

Host Configuration: Each host points to the cluster’s VIP IP address or the specific node’s CVM IP. By default, this will pull the same global information, but stay tuned.

Applying Templates: I apply the Virt-Nutanix-Hypervisor-Snmp-Custom template.

Surgical Filtering: This is the key secret. In the “Host check options,” I apply the custom macro FILTERNAME. This allows me to specify the exact name of the host to monitor. The plugin then filters the SNMP data sent by the VIP to return only what concerns my specific node.

Step 3: The trick to maintaining Cluster consistency

To keep an overview, I use Host Groups in Centreon. I created a group named HG-Cluster-Nutanix-Prod containing my 3 nodes. This allows me to create aggregated dashboards while keeping the “drill-down” capability (clicking to see details) for each physical machine.

Immediate benefits: Dashboarding and Peace of Mind

Since I switched to this configuration, my daily life as a sysadmin has radically changed:

Granular performance analysis: I can now identify a node consuming abnormally more RAM or CPU than its neighbors. It’s the perfect tool for detecting a “hot point” or a VM distribution issue.

Increased responsiveness: When something goes wrong, Centreon sends me an alert with the specific node name (n1, n2, etc.). No more guessing games in Prism Element to find out where to focus my search.

Clean history: I have metric graphs per physical server, which greatly facilitates Capacity Planning and troubleshooting.

Conclusion

If you manage Nutanix, don’t settle for the superficial view offered by the VIP IP alone. By taking 10 minutes to declare your hosts individually in Centreon with the FILTERNAME macro, you move from “passive” monitoring to a true control tower.

My verdict is clear: node-level monitoring is the only way to guarantee true high availability and sleep soundly at night.

Read More

I still remember my first time entering a “serious” server room back in the mid-2000s. What struck me wasn’t so much the deafening roar of the air conditioning, but the physical density of the infrastructure.

Back then, to run a few hundred virtual machines, you didn’t just need “a cluster.” You needed entire rows. Power-hungry Blade Centers, monstrous Fibre Channel switches with their characteristic orange cables, and above all, sitting in the center of the room like a sacred totem: the Storage Array. Entire cabinets filled with 10k RPM mechanical disks, weighing as much as a small car and consuming as many ‘U’ (rack units) as possible.

This is what we call the 3-Tier architecture. While Hyperconvergence (HCI) and Public Cloud seem to be the norm today, it is crucial to understand that 3-Tier was the backbone of enterprise IT for nearly 20 years. To understand this architecture is to understand where we come from, and why we sought to change it.

In this article, the first in a series that will present the evolution of 3-tier virtualization infrastructures towards Nutanix hyperconverged infrastructures, we will factually dissect this standard: how it works, why it dominated the market, and the technical limits that eventually rendered it obsolete for modern workloads.

Genesis: Why Did We Build It This Way?

To understand 3-Tier, you have to go back to the pre-virtualization era. A physical server hosted a single application (Windows + SQL, for example). It was the “Silo” model. Inefficient, expensive, and a nightmare to manage.

Virtualization (led by VMware) arrived with a promise: consolidate multiple virtual servers onto a single physical server. But for this magic to happen, there was an absolute technical condition: mobility.

For a VM to move from physical server A to physical server B without service interruption (the famous vMotion), both servers had to see exactly the same data, at the same moment.

This is where the architecture split into three distinct layers:

  1. We removed the disks from the servers (which now only do computing).
  2. We centralized all data in external shared storage (the Array).
  3. We connected everything via a dedicated ultra-fast network (the SAN).

It was a revolution: the server became “disposable,” or at least interchangeable, because it no longer held the data. But this centralization created a single point of complexity and performance: shared storage. It is the heart of the reactor, but also its Achilles’ heel.

The Anatomy of 3-Tier: Decoupling the Layers

If we were to draw this architecture, it would look like a three-layer cake, where each layer speaks a different language.

1. The Compute Layer

At the very top, we have the physical servers (Hosts). They run the hypervisor (ESXi, Hyper-V, KVM). Their role is purely mathematical: providing CPU and RAM to the virtual machines.

These servers are “Stateless”. They store nothing persistent. If a server burns out, it doesn’t matter: we restart the VMs on its neighbor (HA).

This logic was pushed to the extreme with “Boot from SAN”. We even ended up removing the small local disks (SD cards or SATA DOM) that contained the hypervisor OS so that the server was a total empty shell, loading its own operating system from the distant storage array. A technical feat, but a nightmare in case of SAN connectivity loss.

2. The Network Layer (SAN)

In the middle sits the Storage Area Network. It is the highway that transports data between the servers and the array. Historically, this didn’t go through classic Ethernet (too unstable at the time), but through a dedicated protocol: Fibre Channel (FC).

It is a deterministic and lossless network. Unlike Ethernet which does “best effort,” FC guarantees that packets arrive in order.

If you have administered SAN, you know the pain of Zoning. You had to manually configure on the switches which port (WWN) was allowed to talk to which other port. A single digit error in a 16-character hexadecimal address, and your production cluster would stop dead. It was a task so complex that it often required a dedicated team (“The SAN Team”).

3. The Storage Layer

At the very bottom, the Storage Array. It is a giant computer specialized in writing and reading blocks of data. It contains controllers (the brains) and disk shelves (the capacity).

The array aggregates dozens or even hundreds of physical disks to create large virtual volumes (LUNs) that it presents to the servers. It ensures data protection via hardware RAID.

All the intelligence resides in two controllers (often in Active/Passive or Asymmetric Active/Active mode). This is an architectural bottleneck: no matter if you have 500 ultra-fast SSDs behind them, if your two controllers saturate in CPU or cache, the entire infrastructure slows down. This is called the “Front-end bottleneck”.

The Strengths: Why This Model Ruled the World

It’s easy to criticize 3-Tier with our 2024 eyes, but we must recognize that it brought incredible stability.

  1. Robustness and Maturity: This is hardware designed never to fail. Storage arrays have redundant components everywhere (power supplies, fans, controllers, access paths). We talk about “Five Nines” (99.999% availability).
  2. Fault Isolation: If a server crashes, the storage lives on. If a disk breaks, hardware RAID rebuilds it without the server even noticing (or almost).
  3. Scale-Up Independence: This was the king argument. Running out of space but your CPUs are idling? You just buy an extra disk shelf. Running out of power but have plenty of space? You add a server. You could size each tier independently.

The Weaknesses: The Other Side of the Coin

Despite its robustness, the 3-Tier model began to show serious signs of fatigue in the face of modern virtualization. For us admins, this translated into shortened nights and a few premature gray hairs.

Operational Complexity

The greatest enemy of 3-tier is not failure, it’s the update. Imagine having to update your hypervisor version (ESXi). You can’t just click “Update.” You have to consult the HCL (Hardware Compatibility List). Is my new HBA card driver compatible with my Fibre Channel switch firmware, which itself must be compatible with my storage array OS version? It’s a house of cards. I’ve seen entire infrastructures become unstable simply because a network card firmware was 3 months behind the one recommended by the array manufacturer.

The Bottleneck (The “I/O Blender Effect”)

This is a fascinating and destructive phenomenon. Imagine 50 VMs on a host.

  • VM 1 writes a large sequential file.
  • VM 2 reads from a database.
  • VM 3 boots up.

At the VM level, operations are clean. But when all these operations arrive at the same time in the storage controller funnel, they get mixed up. What was a nice sequential write becomes a slush of random writes (Random I/O). Traditional array controllers, originally designed for single physical servers, often collapse under this type of load, creating latency perceptible to the end user.

The Hidden Cost

Finally, 3-Tier is expensive. Very expensive.

  • Licensing & Support: You pay for server support, SAN switch support, and array support (often indexed to data volume!).
  • Footprint: As mentioned in the introduction, this equipment consumes enormous amounts of space and electricity.
  • Human Expertise: It often requires a team for compute, a team for network, and a team for storage. Incident resolution times explode (“It’s not the network, it’s storage!” – “No, it’s the hypervisor!”).

Conclusion: A Necessary Foundation

The 3-Tier architecture is not dead. It remains relevant for very specific needs, like massive monolithic databases that require dedicated physical performance guarantees.

However, its management complexity and inability to scale linearly paved the way for a new approach. We started asking the forbidden question: “What if, instead of specializing hardware, we used standard servers and managed everything via software?”

It was this reflection that gave birth to Software-Defined Storage (SDS) and Hyperconvergence (HCI). But that is a topic for our next article.

Read More

You might think that over time, you get used to it. That after two years, opening the email announcing the results becomes a mere administrative formality. Well, I must confess: not at all.

It is with immense pride – and undisguised relief – that I announce my nomination as a Nutanix Technology Champion (NTC) for the year 2026. This is the third consecutive year that I have the honor of joining this group of passionate experts.

To be completely transparent, I never take this distinction for granted. In the IT world, technologies evolve fast, and so do we. Staying relevant requires work, curiosity, and above all, the desire to share. Seeing my name once again on the official NTC 2026 list is a beautiful validation of the efforts put into the blog throughout the year.

What is an “NTC”? (Spoiler: It’s not just a LinkedIn badge)

I am often asked if it is an exam I passed, like an NCP-MCI certification. The answer is no, and that is precisely the beauty of this program.

The Nutanix Technology Champion program does not just reward passing a technical multiple-choice quiz. It is a distinction that recognizes community engagement. Basically, Nutanix spots those who spend their free time testing, breaking, fixing, and above all explaining their technologies to others. Whether through blog posts (like here), forum contributions, or talks at events.

For the purists, it is the equivalent of the vExpert at VMware or the MVP at Microsoft. It is the validation of what we call technical “Soft Skills”: the ability to evangelize a solution not because we are paid to do so, but because we master its intricacies and we love it. It is a recognition by peers and by the vendor, and that is what makes it so rewarding.

Under the Hood: Why this nomination matters for the blog

Beyond the shiny logo to put in a signature, being an NTC has a direct impact on the quality of what I can offer you on juliendumur.fr. It is not an honorary title devoid of meaning; it is a key that opens interesting doors.

Concretely, this status gives me privileged access behind the scenes. I have the opportunity to exchange directly with Product Managers and Nutanix engineering teams. This means that when I write a technical article, I can validate my hypotheses at the source, avoiding approximations.

Furthermore, we have access to roadmap briefings and Beta versions. Even if this information is often under NDA (I can’t reveal everything to you in advance!), it allows me to understand the direction the technology is taking. I can thus better anticipate topics to cover and offer you more relevant analyses as soon as features reach General Availability (GA). It is the assurance for you to read content that is not only technically accurate but also in phase with market reality.

Retrospective and 2026 Goals: Full Steam Ahead

This third nomination is the fruit of consistency. But above all, it marks the beginning of a new year of “lab”. The goal is not to collect stars, but to continue exploring the Nutanix Cloud Platform from every angle.

For 2026, I intend to keep offering practical tutorials and field feedback. While the AHV hypervisor remains the unavoidable foundation, I really want to move up the software stack a bit more this year. Expect to see topics covering container orchestration with NKP (Nutanix Kubernetes Platform), automation, and probably a stronger focus on security with Flow. The objective remains the same: dissecting the tech to make it accessible.

A huge thank you to the community for the daily exchanges, and of course to the NTC program team (shout out to Angelo Luciani) for their renewed trust. It is a pleasure to be part of this virtual family.

Now, the ball is also in your court: are there specific topics or features of the Nutanix ecosystem that you would like to see me cover this year? The comments are open!

Read More

I won’t lie to you: when you’ve had a taste of gold, bronze has a peculiar flavor. Last year, I had the immense pride of finishing first in the “Top Bloggers” ranking of the Nutanix Technology Champion (NTC) program.

This year, the verdict is in on the official community blog: I ranked 3rd.

Did I slow down? No. Did I share less? On the contrary. But in tech, just like in sports, staying at the top is often harder than getting there. This 3rd place is, above all, a signal that the competition has intensified. And honestly? It’s exactly what I needed to motivate me to get back in the fight for 2026.

The NTC Program is Not Just a Badge

For those new to the ecosystem, being a Nutanix Technology Champion (NTC) isn’t just about slapping a logo on your LinkedIn profile. It is a commitment. It means being part of a technical vanguard that tests, breaks, fixes, and—above all—documents Nutanix solutions. The “Top Blogger” ranking is the barometer of this activity.

1st in 2024, 3rd in 2025: Analyzing the Logs

So, what happened? I pulled my logs to compare. If my performance had dropped, I would have accepted this 3rd place with a shrug. But the data shows otherwise: my publication volume is equivalent to last year’s. Even better, my strategy was cleaner: instead of doing “bursts” (flurries of articles), I maintained a metronomic consistency, spread evenly over the 12 months.

The conclusion is simple and undeniable: the overall bar has been raised. My peers were absolute beasts this year. They produced more. This is excellent news for the Nutanix community: the ecosystem is alive, dense, and increasingly sharp. But for the competitor in me, it’s a wake-up call. Consistency is no longer enough; just like in cycling, I’m going to have to up the intensity.

Why Publish?

Beyond the rankings and the competition, why continue writing with such discipline? The answer is pragmatic. My blog is primarily my external memory. In our line of work, we don’t remember everything. We test, we configure, we hit a critical error, we resolve it… and six months later, we’ve forgotten how we did it. Blogging is about documenting my own “struggles” so I never have to look for the solution twice. It’s about transforming obscure troubleshooting into a clear tutorial. But make no mistake: every article is born from a real technical need, from a real infra that I built or fixed. No fluffy theory, just experience from the field. The icing on the cake: the feedback from our clients who stumble upon my blog and tell me, “We found a solution on your site.” That is the real reward.

Conclusion: See You at the Finish Line

Bravo to the two peers who finished ahead of me this year. You set the bar very high, and that is exactly what I like. The level of the NTC program is what makes it credible. But the message has been received. The consistency of 2025 was a good foundation, but for 2026, I’m shifting gears. I’m going to chase more specific topics, dig deeper into the guts of Nutanix AOS and AHV, and perhaps explore use cases that no one has documented yet.

The bronze medal is nice. But it will serve primarily as a reminder on my desk: next year, I’m aiming for the yellow jersey.

See you soon for the next technical article.

Read More