Why to Virtualize Hadoop?

Shortly after coming to Nutanix I wrote an article on virtualizing Hadoop, one of the points of the articles was:

One of the bottleneck necks with virtualizing Hadoop is it’s distributed nature. Traditional SANs usually don’t scale as servers are added, usually two storage controllers are to be shared by all the servers. This model is not conducive for a Terasort benchmark. Nutanix virtualizes the storage controller so as you add servers the controllers don’t become the bottleneck.

Because of the above point I believe Nutanix makes virtualization a reality for Hadoop and this also why any “Grey Beards” of the world are going to cringe when they hear SAN and Hadoop in the same sentence. So using that as a back ground where a few points why virtualizing hadoop can save time and money.

1) BIG DATA PRIVATE CLOUD – Allow developers and analysts to provision their own Hadoop clusters on demand for test-and-dev and ephemeral jobs. Sharing resources with other Hadoop clusters or completely different applications, for better datacenter utilization. This one of the main reasons to virtualize in the first place. Rarely if ever all of the compute resources will be used. Depending on your business, VDI in the day, Hadoop at night!

2)MANAGEABILITY – Running Hadoop on the same hardware as your other production workloads, using the same monitoring and management tools you know and love. Most infrastructure teams are green to Hadoop so you can get started with changing lots of internal process. On Nutanix you get Dynamic cluster expansion and rolling upgrades!

3)SECURITY & MULTI-TENANCY – Provision separate Hadoop clusters to keep data separate for different business units and prevent run-away jobs from overtaking your cluster.

4)PERFORMANCE – When virtualizing Hadoop on Nutanix you get the benefit of Auto-Tiering with SSD. Most Hadoop jobs typically only touch a portion of the data, transparently leverage flash to accelerate MapReduce jobs. Even without auto-tiering VMware was proven more virtual machines running on a node to be faster than bare metal.

Faster than Bare Metal

Faster than Bare Metal

5)HIGH AVAILABILITY – VMware High Availability and Fault Tolerance can be used protect Job Tracker and Name Node/Secondary Name Node. From a Nutanix prospective replication, snapshots, compression, dedupe can be applied to your Hadoop environment. If you’re moving your data into HDFS, how do you plan to back it up? How long can you be down? There are ways to do it today but they aren’t pretty and they tie up resources.

6) STARTING SMALL – No one ever deploys and uses a 300 node cluster day 1, it’s a lot like VDI from a deployment perspective. Nutanix and virtualization helps you start small and grow into the right sized environment.

Other Resources:

Protecting Hadoop with VMware vSphere 5 Fault Tolerance

Big Data Extensions for vSphere

Speak Your Mind