Archives for May 2016


Commvault Best Practices on Nutanix

I first remember seeing Commvault in 2007 in the pages of Network World and thought it looked pretty interesting then. At the time I was an CA ARCserve junky and prayed everyday I didn’t have to restore anything. Almost 10 years latter tape is still around, virtualization spawned countless backup vendors and Cloud now makes a easy backup target. Today Commvault is still relevant and plays in all of the aforementioned spaces and like most tech companies we have our own overlap with them to some degree. For me Commvault just has so many options it’s almost a problem of what to use where and when.

The newly released Best Practice Guide with Commvault talks about some of the many options that should be used with Nutanix. Probably the big things that would stand out in my mind if I was new to Nutanix and then read the guide would be the use of a proxy on every host and some of the caveats around Intellisnap.

Proxy On Every Host

What weights more? A pound of feathers or a pound of bricks? The point here is you need a proxy regardless and the proxy is sized on how much data you will be backing up. So instead of having 1 giant proxy you now have smaller proxies that are distributed across the cluster. Smaller proxies can read from local Hot SSD tier and limit network traffic so they can help to limit bottlenecks in your infrastructure.

IntelliSnap is probaly one of the most talked about Commvault features. IntelliSnap allows you to create a point-in-time application-consistent snapshot of backup data on the DSF. The backup administrator doesn’t need to log on to Prism to provide this functionality. A Nutanix-based snapshot is created on the storage array as soon as the VMware snapshot is completed; the system then immediately removes the VMware snapshot. This approach minimizes the size of the redo log and shortens the reconciliation process to reduce the impact on the virtual machine being backed up and minimize the storage requirement for the temporary file. It also allows near-instantaneous snapshot mounts for data access.

With IntelliSnap it’s important to realize that it was invented at a time where LUNS ruled the storage workload. IntelliSnap in some sense turns Nutanix’s giant volumes/containers the hypervisors sees into a giant LUN. Behind the scenes when Intellisnap is used it snaps the whole container regardless if the VMs are being backed up or not. So you should do a little planning when using IntelliSnap. This is ok since IntelliSnap should be used for high transnational VMs and not every VM in the data center. I just like to point out that streaming backups with CBT is still a great choice.

With that being said you can checkout the full guide at the Nutanix Website: Commvault Best Practices


Impact of Nutanix VSS Hardware Support

When 4.6 was released I wrote about how the newly added VSS support with Nutanix Guest Tools (NGT) was the gem of the release. It was fairly big compliment considering some of the important updates that were in the release like cross hypervisor DR and another giant leap in performance.

I finally set some time aside to test the impact of taking a application consistent snapshot with VMware Tools vs the Nutanix VSS Hardware Support.

vmware-vss-qWhen an application consistent snapshot workflow without NGT on ESXi, we take an ESXi snapshot so VMware tools can be used to quiesce the file system. Every time we take an ESXi snapshot, it results in creation of delta disks, During this process ESXi “stuns” the VM to remap virtual disks to these delta files. The amount of stun depends on the number of virtual disks that are attached to the VM and speed in which the delta disks can be created (capability of the underlying storage to process NFS meta-data update operations + releasing/creating/acquiring lock files for all the virtual disks). In this time, the VM is totally unresponsive. No application will run inside the VM, and pings to the VM will fail.

We then delete the snapshot (after backing up the files via hardware snap on the Nutanix side) which results in another set of stuns (deleting a snapshot causes two stuns, one fixed time stun + another stun based on the number of virtual disks). This essentially means that we are causing two or three stuns in rapid succession. These stuns cause meta-data updates in addition to the flushing of data during the VSS snapshot operations.

Customers have reported in set of VMs running Microsoft clustering, these VMs can be voted out due to heartbeat failure. VMware gives customer guidance on increasing timers if your using Microsoft clustering to get around this situation.

To test this out I used HammerDB with a SQL 2014 running on Windows 2012R2. The tests were run on ESXi 6.0 with hardware version 11.


VMware Tools with VSS based Snapshot
I was going to try to stitch the images together because of the time it took but decided to leave as is.


The total process took ~4 minutes.

NGT with VSS Hardware Support based Snapshot
NGT based VSS snapshots don’t cause VM stuns. The application will be stunned temporarily within Windows to flush the data, but pings and other things should work.


The total process took ~1 minute.


NGT with VSS hardware support is the Belle of the Ball! While there is no fixed number to explain the max stun times. It depends on how heavy the workload is but what we can see is the effect of not using NGT for application consistent snapshot and it’s pretty big. The collapsing of ESXi snapshots cause additional load and should be avoided if possible. NGT offers hypervisor agnostic approach and currently works with AHV as well.

Note: Hypervisor snapshot consolidation is better in ESXi 6 than ESXi 5.5.

Thanks to Karthik Chandrasekaran and Manan Shah for all their hard work and contribution to this blog post.