My View of Hadoop Distributions from the Passenger Seat

This blog post comes from the passenger seat of my Yukon as I head to the lake. It’s a brief musing of thoughts on Hadoop that would really fit into 140 characters.

Hadoop is here and making its way into the Enterprise. Data growth will explode 50X according to IDC over the next decade. The 50X doesn’t even include all data that will flow thru your business. This data represents competitive advantage, you just need the ability to collect and analyze it.

Hadoop known mostly for analyzing large datasets in a batch process is rapidly changing. “Just In Time” processing is now a reality. SQL and NoSQL are getting mashed together and data stored in HDFS is not having to get moved out to be analyzed.

Battle of the Distro’s

MapR – Taking the approach of releasing a more proprietary release of Hadoop. Fast out if the gate, they seem to be doing well. My fear is that they will get to far down one path and won’t be able use the power of the community. They do have committers on their team so that should help. They also have partnerships with Google and Amazon.

Cloudera – A mix between proprietary and open-source. They have seen success and it can be contributed to the tools that they have built to help run and maintain their distribution. Lots of talk about Impala, super fast query performance against HDFS and HBase. Jim Hammerbacher, pervious from Facebook gives them a lot street cred.

Hortonworks – Taking the long term approach, Hortonworks is 100% open source. They make their revenue off training and support services for their distribution. They have a Impala like project called Stinger. The difference is they are still using Hive, just speeding it up by orders of magnitude. I personally dig Hortonworks because they seem to have strong support around virtualizing Hadoop. I also like Hortonworks partnership with Microsoft, sure to help speed up SQL performance.

Intel – Seems to be focusing around security with making the best use of there CPU’s and their SSD’s for compression and encryption. Personally I don’t see how that gives them a leg up on the other distributions as all of them could use their hardware. Intel seems to going the OEM route which is not surprising. I think there relationship with SAP will bold really well for them in the enterprise space.

Please leave your own thoughts, a very interesting landscape.

Speak Your Mind