I see more and more people looking at getting started with Hadoop but it can be risky if you don’t have the skills and time in your organization. To top that off you have to buy new equipment that will mostly likely only be used 20% of the time.
Nutanix has always supported mixed workloads but Hadoop can be blunt force trauma on storage for a variety reasons:
1) Hadoop was never built for shared storage, it was architect around data locality which is core architectural design feature of Nutanix.
2) Large ingest and working sets can destroy the storage cache for your remaining workloads. With Nutanix you can bypass the Flash tier for sequential traffic. If the workloads was sized properly the HDDs are usually relative inactive as they are meant to be cold storage. Using the idle disk for Hadoop will give infrastructure and hadoop teams the ability to test the waters before carrying on.
In the case of customers running NX-8150, they might never need to buy extra nodes for compute. With 20 HDDs at your disposable the raw disk gives great performance with out flash. If your performance is fine running just from HDD you can save additional cost by adding storage only nodes. The storage only nodes don’t require additional licensing from Cloudera or Hortonworks.
Performance on Cloudera with 4 nodes of NX-8150 using no flash
In the above case CPU was only at 50% so you could run additional workloads even while Hadoop was running. If your goal is just Test/Dev you can also turn HDSF replication factor to 1 since Nutanix provides enterprise class redundancy already. When you add in erasure encoding the effective capacity will be less than 2X compared to 3X with traditional hadoop.
Please hit me up on twitter @dlink7 if you have any questions.