How to Virtualize Hadoop the Nutanix Way

Prior to working at Nutanix I wrote an article just before VMworld 2012 VMware and Nutanix Give Wings to Hadoop. Now with my feet grounded in the company I think I deliver to you some technical content on how to put some wind beneath your wings on virtualizing Hadoop.

One of the bottleneck necks with virtualizing Hadoop is it’s distributed nature. Traditional SANs usually don’t scale as servers are added, usually two storage controllers are to be shared by all the servers. This model is not conducive for a Terasort benchmark. Nutanix virtualizes the storage controller so as you add servers the controllers don’t become the bottleneck.

The second issue with Hadoop is that there is lots of temporary data created when doing Map Reduce jobs. Putting these temporary files onto a system to incur RAID or a replication penalties wouldn’t make a whole bunch of sense. It’s only after the Reduce job ends that the results to stored on a highly reliable filesystem. If the job fails before finishing, the task is just reran on another node in the Hadoop cluster.

Hadoop’s Distributed File System (HDFS) uses the concept of data locality much like the the Nutanix Distributed File System. Hadoop breaks up the Map Reduce jobs into “Splits”. Hadoop ideally wants each split to be stored on 1 node to save network traffic which helps with efficiency.


The above image tells the story on how Nutanix comes to the rescue. Nutanix uses a concept called Replication Factor(RF) to enable resiliency. If you’re familiar with Hadoop they use the same concept. A RF of 2 means there are two copies of the data on separate nodes within in the cluster. A RF of 1 means there will only be one copy of the data. The great thing about a RF of 1 is that you get the speed of staying on the PCIe bus without the limitations. With RF of 1 you are still able to vMotion your virtual machine around without the cost of doing a storage vMotion.

The following command will allow you to set the RF to 1 on your container\NFS volume on Nutanix. The ncli can be ran from anyone of the Storage Controller Virtual Machines or you can download the needed files from the admin console. Id is the ID of the container.

ncli ctr edit id= rf=1

The replication factor can also be applied at the Persistent Cache Tier. This setting is system wide so you will want to exercise extreme caution and understand the impact of before setting it.

ncli ctr edit id= enable-oplog-ha=false

The below website explains the impact of these changes when it comes to IO flowing thru the Nutanix Cluster.

A day in a life of a Nutanix storage IO

More information on running Hadoop Virtualized on Nutanix, please download the reference architecture posted under Technical Guides.


  1. Samuel Rothenbuehler says:


    If I create a container with RF1 and deploy a VM on that container that VM will write all data to the local disks only. So far this makes sense and if we allow that VM only to be executed on that Nutanix node is there a guarantee that none of that data will move to another host/node in the backend? I know that there is a rebalancing functionality as part of NDFS which redestributes data between Nutanix nodes in case some nodes are less used that others. If there is no guarantee the RF1 data won’t leave the node is there a way to see on which node(s) a Vdisk is stored? And is there a command to force all data of a given Vdisk to be migrated back to a single Nutanix node?

    Thanks for your help!

    P.S.:The scenario we are looking at is an Elasticsearch (big data) application cluster which uses it’s own replication mechanism on top of Nutanix. We want to deploy one Elasticsearch VM on each Nutanix Node with it’s own copy only stored locally (RF1). We have to make sure that all data will remain on that Nutanix node’s disks no that suddenly a Nutanix node failure could mean the loss of Elasticsearch data of two VMs running on two different Nutanix nodes.


  1. […] after coming to Nutanix I wrote an article on virtualizing Hadoop, one of the points of the articles […]

Speak Your Mind