Impact of Nutanix VSS Hardware Support

When 4.6 was released I wrote about how the newly added VSS support with Nutanix Guest Tools (NGT) was the gem of the release. It was fairly big compliment considering some of the important updates that were in the release like cross hypervisor DR and another giant leap in performance.

I finally set some time aside to test the impact of taking a application consistent snapshot with VMware Tools vs the Nutanix VSS Hardware Support.

vmware-vss-qWhen an application consistent snapshot workflow without NGT on ESXi, we take an ESXi snapshot so VMware tools can be used to quiesce the file system. Every time we take an ESXi snapshot, it results in creation of delta disks, During this process ESXi “stuns” the VM to remap virtual disks to these delta files. The amount of stun depends on the number of virtual disks that are attached to the VM and speed in which the delta disks can be created (capability of the underlying storage to process NFS meta-data update operations + releasing/creating/acquiring lock files for all the virtual disks). In this time, the VM is totally unresponsive. No application will run inside the VM, and pings to the VM will fail.

We then delete the snapshot (after backing up the files via hardware snap on the Nutanix side) which results in another set of stuns (deleting a snapshot causes two stuns, one fixed time stun + another stun based on the number of virtual disks). This essentially means that we are causing two or three stuns in rapid succession. These stuns cause meta-data updates in addition to the flushing of data during the VSS snapshot operations.

Customers have reported in set of VMs running Microsoft clustering, these VMs can be voted out due to heartbeat failure. VMware gives customer guidance on increasing timers if your using Microsoft clustering to get around this situation.

To test this out I used HammerDB with a SQL 2014 running on Windows 2012R2. The tests were run on ESXi 6.0 with hardware version 11.


VMware Tools with VSS based Snapshot
I was going to try to stitch the images together because of the time it took but decided to leave as is.


The total process took ~4 minutes.

NGT with VSS Hardware Support based Snapshot
NGT based VSS snapshots don’t cause VM stuns. The application will be stunned temporarily within Windows to flush the data, but pings and other things should work.


The total process took ~1 minute.


NGT with VSS hardware support is the Belle of the Ball! While there is no fixed number to explain the max stun times. It depends on how heavy the workload is but what we can see is the effect of not using NGT for application consistent snapshot and it’s pretty big. The collapsing of ESXi snapshots cause additional load and should be avoided if possible. NGT offers hypervisor agnostic approach and currently works with AHV as well.

Note: Hypervisor snapshot consolidation is better in ESXi 6 than ESXi 5.5.

Thanks to Karthik Chandrasekaran and Manan Shah for all their hard work and contribution to this blog post.


Save Your Time With Nutanix Automatic Support

Best Industry Support

The feature known as Pulse is enabled by default and sends cluster status information automatically to Nutanix customer support. After you have completed initial setup, created a cluster, and opened ports 80 or 8443 in your firewall, AOS sends a Pulse message from each cluster once every 24 hours. Each message includes cluster configuration and health status that can be used by Nutanix Support to address any cluster operation issues.

AOS can also send automatic alert email notifications to Nutanix Support by default through ports 80 or 8443. Like Pulse, any configured firewall must have these ports open. Some examples of conditions that will automatically generate a proactive case with Nutanix support with a Priority Level P4.

The Stargate process is down for more than 3 hours
Curator scan fails
Hardware Clock Failure
Faulty RAM module
Power Supply failure
Unable to fetch IPMI SDR repository (IPMI Error)
HyperV networking
System operations
Disk Capacity > 90%
Bad Drive

You can optionally use your own SMTP server to send Pulse and alert notifications. If you do not or cannot configure an SMTP server, another option is to implement an HTTP proxy as part of your overall support scheme.

While the best thing is never to a get a call, 2nd best is not waiting in line to open a ticket. Have a great week!


3rd Generation Erasure Coding (EC-X) – What’s Next?

Take time for all things: great haste makes great waste. Benjamin Franklin

I don’t profess to be an erasure coding genius but I know enough that it would be very poor choice for workloads that has lots of overwrites, cycling thru lots of snapshots and running erasure coding inline would really only be suited for a WORM application which is not typical for a lot of virtual environments. Nutanix first released erasure coding as EC-X in AOS 4.1.3 as a tech preview and has learned lots along the way with it’s agile software development method.

With AOS 4.6.1 being released on April 18th more improvements were added for EC-X.

    Faster reclamation – simply put if your EC strip is changed holes start appearing in your strip. You need an efficient of plugging the holes and allow them to be encoded again. /ol>

      Advanced EC-X selection heuristics – Nutanix engineering has come up with an algorithm to determine to use blocks form the same virtual hard drive or blocks from through out the container. Better selection reduces the need to fix strips and reduce CPU load on the cluster. This also helps to fix the problem of cycling through lots of snapshots.
      Strip compaction – If a EC-X strip has too many holes it won’t even try to fill the gaps. It will determine to move the data out of the strip

    With the mission to enable enterprise cloud more and more of the features are becoming self adjusting to truly allow for set and forget. The end goal is to have all the features turned on and let the system side. I am looking forward to watching the announcements at .Next in June.

    come to  .Next


Tech Preview: Nutanix Delivers Native File Services

Acropolis File Services provides a highly availability solution for hosting user and shared department files across a centralized location with a single namespace. Acropolis File Services removes the burden of manual configuration, Active Directory knowledge, load balancing expertise and scaling the solution is controlled thru the Nutanix Prism UI. The Acropolis File Server will greatly simplify non-persistent desktops environments and other use cases that require shared storage for user data.

Customers can deploy Acropolis File Services on existing AHV Clusters in the Tech-Preview. The Acropolis File Server benefits from all of the storage centric features of the Acropolis Distributed Storage Fabric so management and scaling is easy and intuitive. Compression, dedupe and erasure encoding (EC-X) can all be used with the Acropolis File server and if performance needs are meet with existing nodes, storage only nodes can be used to increase capacity.

As shares are created on the Acropolis File Server, load is distributed across the cluster seamless to the end user. Daily datacenter operations are covered as you scale up the Acropolis File Server and deal with physical nodes being added to the Nutanix Cluster and dealing with typical failure scenarios.


If a share called \\FileServerNameFQN\Users was created and contains top level directories \Bob, \Becky,\ Kevin – \Bob would be on say VM1, \Becky on VM2, \Kevin on VM3 and so on. A string hashing algorithm based on the directory names is used to distribute the top level folder directories. This makes non-persistent VDI very easy to setup and deliver without running into bottlenecks.

Watch the below video to see how easy it is to deploy and expand file services in your Nutanix cluster.

<Corporate Announcement – File Services>

New ‘Acropolis File Services’ with Native Support for VMware Horizon UEM and Citrix Profile Manager


Commvault IntelliSnap & Metro Availability with Nutanix

I was asked if Commvault IntelliSnap works with Metro Availability on Nutanix and I wasn’t 100% certain due to the snapshots that Metro production domain would have to take. So after giving it a quick test, it turns out it works just fine.

Setting your retention policy

Setting your retention policy

I would just take into account that if you have separate jobs running that each time it will take a snapshot of the container and then your retention policy will come into affect for the production domain.

Below is quick video of the process in action.


Nutanix 4.6 – Scheduling Security Configuration Management Automation

All of the advanced security settings are controlled with NCLI in Acropolis 4.6. The security related commands are under the cluster object. NCLI is very tab friendly so you really don’t need to memorize the commands. The schedule command refers Nutanix Security Configuration Management Automation that will run the systems checks to make sure your system is compliant. The default schedule is set to DAILY. It can be set to HOURLY, DAILY, WEEKLY or MONTHLY. All of the other settings can be set to TRUE or FALSE.

There are separate commands for Storage and the Acropolis Hypervisor.

Storage Commands

ncli> cluster edit-cvm-security-params schedule=hourly

Enable Aide : true
Enable Core : true
Enable High Strength P… : false
Enable Banner : true
Enable SNMPv3 Only : true
Schedule : HOURLY

Acropolis Hypervisor Commands

ncli> cluster edit-hypervisor-security-params schedule=HOURLY

Enable Aide : true
Enable Core : true
Enable High Strength P… : false
Enable Banner : false
Schedule : HOURLY


The Gem of 4.6: NGT with VSS Hardware Support

One of my favorite areas of the 4.6 release is data protection and the hidden gem has to be Nutanix Guest Tools (NGT). NGT really enables a lot of features, cross-hypervisor DR, Dial (change ESXi to AHV and back again), Nutanix VSS Hardware provider, and Self Service Restore.

Nutanix Guest Tools also includes the Nutanix Guest Agent (NGA) service that communicates with Nutanix Controller VM and Nutanix VM Mobility Drivers. The Drivers for facilitating VM migration between ESXi and AHV, in-place hypervisor conversion, and cross-hypervisor disaster recovery features.


NGT is enabled from within Prism. Simply select the Enable NGT and an ISO will be mounted to the virtual machine.


After mounting NGT on a VM, you can configure your Windows machine to use NGT. Log into the Windows guest VM. Double-click the Nutanix icon labeled X and a way you go. After accepting the license agreement and follow the prompts to configure NGT on the virtual machine. After installation finishes, Nutanix guest agents are installed on the VM and you can use all the NGT features (self-service restore, cross-hypervisor disaster recovery, application consistent snapshot with VSS on AHV, or in-place hypervisor conversion from ESXi to AHV and AHV to ESXi).

While cross-hypervisor disaster recovery will get a lot of coverage I think Nutanix’s VSS hardware support is the best news. Now application-consistent snapshots can take advantage of the Nutanix framework and services such as Microsoft Volume Shadow Copy Service (VSS) to quiesce the VM and supported applications, rendering them into a known or consistent state. For systems using ESXi or the Acropolis Hypervisor (AHV) running Microsoft Windows guests, the Nutanix Guest Agent running in the guest OS provides VSS support. Using the native Nutanix VSS hardware provider, the Nutanix Guest Agent is called to quiesce the OS and supported applications such as Microsoft Exchange and SQL Server before XCP takes an application-consistent snapshot of the VM. Application quiescence times are lower than those previous hypervisor-based snapshot tools could deliver. Lower quiescence times help to keep application performance constant and reduce the I/O required when collapsing a hypervisor-based snapshot. To my knowledge this is first for HCI not relying on VMware Tools(VMware snapshots) so it’s big step for supporting the largest SQL databases.

Supported Operating Systems for NGT are:


    Windows 2008 R2 or later versions
    Windows 7 or later versions


    CentOS 6.5 and 7.0
    Red Had Enterprise Linux (RHEL) 6.5 and 7.0
    Oracle Linux 6.5 and 7.0
    SUSE Linux Enterprise Server (SLES) 11 SP4 and 12
    Ubuntu 14.04 or later releases

Powering Dell XC – Erasure Coding and VM Flash Mode

This is the short overview of the software powering the Dell XC hardware. I wanted to talk about some technical features since it was a Tech Field Day so I landed on Erasure Coding EC-X and VM Flash Mode. One thing I also like to stress around going into HCI is management. The fact that you are splitting up a traditional storage array into multiple individual parts should not be lost on people. Of all the things the Acropolis software delivers, it’s management is what will allow Nutanix to compete with the cloud providers of the world. In the land of high availability us as humans are the most dangerous thing to happen to hardware, not the components themselves.

Session from Tech Field Day 10:

Some great additional resources:

Nutanix XCP VM Flash Mode – Enable SSD performance in a Hybrid System

Nutanix – Erasure Coding (EC-X) Deep Dive


Dell on Why an OEM Agreement Matters with Hyper-Convergered Infrastructure

It was the first time as a presenter attending Tech Field Day. I had been an attendee twice and a long time follower of the event. I’ve always thought the independent(as much as one can be) guests that Tech Field provides is some of the best in the industry. This event was no different; Forbes, Storage Architects, virtualization experts, exchange gurus and the list goes on. There really isn’t much that gets passed this diverse crew.

Senior Lewie Newcomb, Executive Director, Storage Product Group of Dell, started the show for Dell. Lewie does a great job of telling why hardware matters and how an OEM relationship forges an appliance like Dell XC to bring over and above a software only play. Lewie goes onto comment it’s one of the most successful products he has dealt with at Dell.

Watch the below segment on Dell’s SDS strategy and how the journey started with Nutanix.


Commvault IntelliSnap and Nutanix Video

A quick how to video and it even shows how you could restore to Amazon if needed.