My View of Hadoop Distributions from the Passenger Seat

This blog post comes from the passenger seat of my Yukon as I head to the lake. It’s a brief musing of thoughts on Hadoop that would really fit into 140 characters.

Hadoop is here and making its way into the Enterprise. Data growth will explode 50X according to IDC over the next decade. The 50X doesn’t even include all data that will flow thru your business. This data represents competitive advantage, you just need the ability to collect and analyze it.

Hadoop known mostly for analyzing large datasets in a batch process is rapidly changing. “Just In Time” processing is now a reality. SQL and NoSQL are getting mashed together and data stored in HDFS is not having to get moved out to be analyzed.

Battle of the Distro’s

MapR – Taking the approach of releasing a more proprietary release of Hadoop. Fast out if the gate, they seem to be doing well. My fear is that they will get to far down one path and won’t be able use the power of the community. They do have committers on their team so that should help. They also have partnerships with Google and Amazon.

Cloudera – A mix between proprietary and open-source. They have seen success and it can be contributed to the tools that they have built to help run and maintain their distribution. Lots of talk about Impala, super fast query performance against HDFS and HBase. Jim Hammerbacher, pervious from Facebook gives them a lot street cred.

Hortonworks – Taking the long term approach, Hortonworks is 100% open source. They make their revenue off training and support services for their distribution. They have a Impala like project called Stinger. The difference is they are still using Hive, just speeding it up by orders of magnitude. I personally dig Hortonworks because they seem to have strong support around virtualizing Hadoop. I also like Hortonworks partnership with Microsoft, sure to help speed up SQL performance.

Intel – Seems to be focusing around security with making the best use of there CPU’s and their SSD’s for compression and encryption. Personally I don’t see how that gives them a leg up on the other distributions as all of them could use their hardware. Intel seems to going the OEM route which is not surprising. I think there relationship with SAP will bold really well for them in the enterprise space.

Please leave your own thoughts, a very interesting landscape.


How to Virtualize Hadoop the Nutanix Way

Prior to working at Nutanix I wrote an article just before VMworld 2012 VMware and Nutanix Give Wings to Hadoop. Now with my feet grounded in the company I think I deliver to you some technical content on how to put some wind beneath your wings on virtualizing Hadoop.

One of the bottleneck necks with virtualizing Hadoop is it’s distributed nature. Traditional SANs usually don’t scale as servers are added, usually two storage controllers are to be shared by all the servers. This model is not conducive for a Terasort benchmark. Nutanix virtualizes the storage controller so as you add servers the controllers don’t become the bottleneck.

[Read more…]


Squeezing the Last Drop for VDI and HPC – Nutanix’s Replication Factor

So I apologize in advance to the Nutanix Support team. The beauty of Nutanix with its Google like file system, NDFS(Nutanix Distributed File System) is the ease of use with performance. A distributed architecture that scales and massive parallelism at 25,000 IOPS a block is a lot to handle but some people are just plain greedy. If you see the work that Jim Moyle has done with Windows 7, you realize a Win 7 desktop will eat as many IOPS it can get it’s hands on.
[Read more…]


Hadoop Terminology Cheat Sheet

I found this slide from a Cloudera deck and thought it would be great to share. Hadoop projects and terms can get kind of confusing with all the different animal references that seem to used.

Both Cloudera and Horton Works have great resources for learning Hadoop and are making advances in availability and reliability. I hoping to start a HPC\Hadoop section on this blog.
[Read more…]