EVO RAIL: Status Quo for Nutanix

boatSome will make a big splash about the launch of EVO RAIL but the reality is that things remain status quo. While I do work for Nutanix and I am admittedly biased, the fact is that Nutanix as a company was formed in 2009 and has been selling since 2011. VSAN and now EVO RAIL is a validation of what Nutanix has been doing over the last 5 years. In this case, high tide lifts all boats.

Nutanix will continue to partner with VMware for all solutions, just like VDI, RDS, SRM, Server Virt, Big data applications like Splunk and private cloud. Yes we will compete with VSAN but I think the products are worlds apart mostly due to architectural decisions. Nutanix helps to sell vSphere and enable all the solutions that VMware provides today. Nutanix has various models that serve Tier 1 SQL\Oracle all the way down to the remote branch where you might want only a hand full of VM’s. Today EVO RAIL is only positioned to serve only Tier 2, Test/Dev and VDI. The presentation I sat in on as a vExpert confirmed Teir 1 was not a current use case. I do feel that this is mistake for EVO RAIL. By not being able to address Tier 1 which I would include VDI in the use case, you end up creating silos in the data center which is everything that SDDC should be trying to eliminate.

Nutanix Uses Cases

Some of the available Nutanix Uses Cases


Nutanix is still King of Scale but I am interested to hear more about EVO RACK which still in tech preview. EVO RAIL in version 1.0 will only scale to 16 nodes\servers or 4 appliances. Nutanix doesn’t really have a limit but tends to follow hypervisor limits, most Nutanix RA’s are around 48 nodes from a failure domain perspective.

Some important differences between Nutanix and EVO RAIL:

* Nutanix starts at 3 nodes, EVO RAIL starts at 4 nodes.

* Nutanix uses Hot Optimized Tiering based on data analysis and cache from RAM which can be deduped, EVO RAIL uses caching from SSD(70% of all SDD is used for cache).

* You can buy 1 Nutanix node at a time, EVO RAIL only is sold with 4 nodes at a time. Though I think this has do with trying to keep a single sku. The SMB in the market will find it had to make this jump though. On the Enterprise side you need to be able to have different node types if your compute\capacity doesn’t match up.

* Nutanix can scale with different node types ranging in different levels of storage and compute, EVO RAIL today is a hard locked configuration. You are unable to even change the amount of RAM from the OEM vendor. CPU’s are only 6 core which leads to needing more nodes = more licenses.

* EVO RAIL is only spec’d for 250 desktops\100 general server VM’s per appliance. Nutanix can deliver 440 desktops per 2U appliance with a medium Login VSI workload and 200 general server VM’s when enabling features like inline dedupe on the 3460 series. In short we have no limits if you don’t have CPU\RAM contention.


* Nutanix has 1 Storage Controller(VM) per host that takes cares of VM Cailber Snapshots, inline compression, inline Dedupe, Map Reduce Dedupe, Map Reduce compression, Analytics, Cluster Health, Replication, hardware support. EVO Rail will have a EVO management software(web server), vCenter VM, Log insight VM and a VM from the OEM Vendor for hardware support and vSphere replication VM if needed.

* Nutanix is able to have separation between compute and storage clusters. EVO RAIL is one large compute cluster with only storage container. By having separation you can have smaller compute clusters and still enjoy one giant volume. This is really just an issue of having flexibility on design.

* Nutanix can run with any license of vSphere, EVO RAIL license is Enterprise Plus. I am not sure how that will affect pricing. I suspect the OEM will be made to keep it at normal prices because if would affect the rest of their business.

* Nutanix can manage multiple large\small cluster with Prism Central. EVO RAIL has no multi-cluster management.

* Nutanix you get to use all of the available hard drives for all of the data out of the box. EVO RAIL you have to increase the stripe width to take advantage of all the available disks when data is moved from cache
to hard disk.

* Nutanix offers both Analysis and built in troubleshooting tools in the Virtual Storage Controller. You don’t have to add another VM in to provide the services.

Chad Sakac mentioned in one of his articles “my application stack has these rich protection/replication and availability SLAs – because it’s how it was built before we started thinking about CI models””, that you might not pick EVO RAIL and go to a Vblock. I disagree on the CI part. Nutanix has the highest levels of data protection today. Synchronous writes, bit rot prevention, all data is check summed, data is continuously scrubbed in low periods, Nutanix based snapshots for backup and DR.

It’s a shame that EVO RAIL went with the form factor they did. VSAN can lose up to 3 nodes at any one time which is good but in the current design it will need5 copies of data to ensure that a block going down will not cause data loss when you go to scale the solution. I think they should have stayed with a 1 node – 2 U solution. Nutanix has a feature called Availability Domains that allows us to survive a whole block going down and the cluster can still function. This feature doesn’t require any additional storage capacity to use the feature, just the minimum two copies of data.

More information on Availability Domains can be found on the Nutanix Bible


* Nutanix can Scale past 32 nodes, VSAN is supported for 32 nodes but yet EVO RAIL is only supported for 16 nodes. I don’t know why they made this decision.

* Prism Element has no limits to the number objects that it can manage. EVO RAIL is still limited by the number of components. I believe that the limited hardware specs are being used to limit the number components so this does not become an issue in the field.

* Nutanix when you add a node you can enjoy the performance benefits right away. EVO RAIL you have to wait until new VM’s are created to make use of the new flash and hard drives(or a perform a maintenance operation). Lot of this has to do on how Nutanix controls the placement of data, data locality helps with this.

I think the launch of EVO RAIL shows how important hardware still is when achieving 5 9’s of availability. Look out dual headed storage architectures, your lunch just got a little smaller again.


VMware Horizon View 6 – Impact of VCAI

View Composer Array Integration with Native NFS Snapshot Technology (VAAI) started off at tech preview in View 5.1 but now is fully supported. Below highlights the impact of not having VCAI support if you’re using View Composer in your environment to deploy desktops.

Nutanix supports VCAI

Everything on the lefthand of the line is a result of not having VCAI support. Your golden image has to be copied over to recreate the new replica image as thevbase for the new desktops. Over 11,000 IOPS are used in this example and over 700 MBps of bandwidth consumed. Then times this by how many golden images your team is supported plus the extra time it takes to create the copy the image over. There is also impact to users that have to work during the maintenance period.

If you’re using VCAI your deployment journey would begin to the right of the line. Nutanix fully supports VCAI and also can give the ability to deploy full clones without View Composer.

45 minutes to deploy 400 desktops with VCAI
Time is saved by not having to do the full copy and VCAI does provide better caching of reads on Nutanix as well. Without VCAI it would have been north of 50 minutes and the performance tier would have be used instead of keeping it to deliver great user experience.

What to see this in action at VMworld? Stop by the both 1535 for a demo.


Why #Webscale Reason 5: Brain Drain, Training Budgets & Turnkey Solutions

Companies like Google, Amazon and Facebook had to invent (code) new technologies and approaches to doing IT because no alternative to traditional IT existed. Lots of the technologies surrounding this can be complicated and does take a highly trained team to forge ahead. Web-scale is not an all-or-nothing proposition. Today we’ve reached a point where the principles and patterns are well-understood and turnkey enterprise-class solutions are emerging to bring web-scale capabilities to the enterprise. These don’t require PhDs to operate. Even some of the industry storage giants like EMC are trying to deploy similar technologies to provide true scale out technologies. Nutanix has been building upon these technologies since 2009 so people can do more with less. An IT Admin has the option of never leaving the Prism UI if they want.

Like it or not, enterprise IT is fighting with the cloud for relevance. Enterprise IT is not that way by choice. The politics and finger-pointing is what traditional infrastructure constraints and complexity have created. Budget constraints are all the more reason why you need an alternative. If you have the opportunity to learn one skill to save countless hours down the road is that not a fair price to pay? I remember an old boss questioning my VMware 3.0 training over the same things. Do I need it? Is it valuable? Many of the skills that were considered niche 5 years ago are now mainstream. Companies like Nutanix are eliminating the need for specialized talent by delivering turnkey solutions that are web-scale inside but provide enterprise capabilities, offering the best of both worlds.

The reason why VMware SRM was invited was so people could get out of the weeds of scripting and engineering their DR plans. When people changed jobs or left the company you wouldn’t have to be worried about the next lady/man stepping into fill their shoes and figure the failover process if a disaster were to occur.

With any new technology or paradigm shift there needs to be a way to bridge the gap between the two worlds. The difference between Public vs Private cloud in this case is learning a UI and hiding the complexity. Virtualization is a key aspect to Nutanix so a lot skills will work in the old and the new land of the datacenter.


Why #Webscale Reason #4: Machine Data & Analytics #Nutanix #Linkedin

When you open up your infrastructure up to API’s and have a platform to automate all aspects it allows a common management and analytic platform. Silo’s of infrastructure not only put additional strain not only for storage performance with the IO blender effect but also managing the wealth of data that is generated. Google’s ability to collect and analyze has changed the game for them. Having different hardware, different data centers and different use cases to contend with, it’s all about managing the whole story and seeing problems before they end up on your CIO’s dashboard. This can really only be with a shared nothing architecture.

Look at how LinkedIn is doing it. Similar aspects to the Nutanix design.

Want to learn more, great live info coming here.


Why #Webscale Reason #3: It’s about the people – #Twitter #APIgee #DataStax #Nutanix

It’s not all about wing dings and nuts & bolts. It’s easy to get lost in the weeds of technology and forget the greater purpose of why a IT department exists. When technology religion starts to dictate what is right for business it can easily turn into a dead end street. People and process are the hardest things inside of tech and where web-scale plays a part. Web-scale is about launch first, optimize later. Focusing what you’re good at and getting to the last 10% can be iterative process. It’s not about speeds and feeds, it’s about getting your teams to focus on the business and work together. It’s breaking down tradtional silos and helping move the needle. I believe the general sysadmin will have a long life ahead of them versus people that are totally focused in one area.

At Nutanix we have no religion on hardware. Today we OEM through Super Micro, tomorrow we could switch if the economics made sense, performance and form factor made sense.

Launch first has allowed Nutanix to get to MapReduce Dedupe (Post-Process) probably in one of the quickest fashions. It started with inline dedupe for performance, it was put into production and built upon work from out Medusa/Cassadonra team. Then MapReduce Dedupe came focusing on OS and application data. Over time more algorithms will be added to MapReduce Dedupe which will potentially lead to more features.

From a customer perspective launch first gives you more options to make a better descsion. This is another reason why hybrid cloud will succeed.

“If all you have is a hammer, everything looks like a nail”

Catch a live tech panel on Wednesday June 25th, 2014 – 10:00AM–10:45AM PDT

Designing and Building Web-scale Systems

Panel line-up:

Dmitriy Ryaboy (Engineering lead at Twitter)
Karthik Ranganathan (Engineer at Nutanix)
Anant Jhingran (CTO of APIgee, IBM Fellow)
Darshan Rawal (Director of Product Management, DataStax)



NOS 4.0 PowerShell – Add All Your VMs To A Protection Domain

There has been some requests from customers including on the Nutanix Next Community Site asking for a way to automatically add all the VM’s to a protection domain or at least have a default. Good thing about still using per-VM replication in this use case is that we are not sending the vswap files across the network\WAN. If the data has been deduped already and sent over the wire, you wont have send that as well.

Using PowerShell you can accomplish this with just two lines of code, maybe 1 if your smarter than me (not hard!)

$unprotectedvms = Get-UnprotectedVms
foreach($x in $unprotectedvms){ Add-VmsByNamesToProtectionDoman -name PD-2 -Names $x.vmName}

Nutanix PowerShell

Thankfully you can remove them out of the protection domain easily too.


#Nutanix 1 TB HDD Rebuild in 3 min – Video

Below shows the impact of drive rebuild when the cluster is under no load. I used the Xangati dashboard to track internode communication. Nutanix does have QofS over drive rebuilds. Important thing is that HDD’s rebuild to other HDD’s and don’t impact performance by flooding the flash tier. This data is considered cold so it shouldn’t impact the write path.

A Nutanix cluster is limited roughly 40 MB/s per node for rebuilding HDD’s. Nutanix can achieve linear rebuild times because the data is evenly spread-out through the cluster.More info here.

* The HHD being rebuilt is 1 TB. It had 24 GB of data sitting on it.
* The HDD was removed using the UI
* The 3 min time is for the rebuild. NOS will preform another map-reduce job before to verify everything is rebuilt before releasing the drive. The secondary map-reduce job is considered low priority because the data in this case still exists.
* To add the HDD back into the cluster you have to edit-zeus to remove the tombstone record as this isn’t considered a normal procedure.


Data Locality: Congestion Not Latency

I love Nutanix data locality for helping with the noisy neighbour syndrome and because it helps in spreading the data evenly across the cluster. Spreading the data across the cluster has impact on rebuild times and bottlenecks.

Our Director of Engineering, Vishal Sinha brought up another good point yesterday around data locality. He mentioned that what kills network performance is not latency but congestion. Congestion can come in many forms – microbursts (200Mbps burst for 10ms, which equates to 20Gbps equivalent of traffic on the 10G port for that 10ms, resulting in lots of traffic getting dropped if the switch does not have enough buffer space), or for e.g. a mis-behaving NIC sending PAUSE frame and slowing down the network.
Our data locality feature drastically reduces chances of running into network related storage issues since we don’t rely on network for reads and we need to write to only one remote node. It’s all about reducing coupling and dependency between various components, and limiting resource consumption. Do more with less, make components independent == scalability. Data locality is core component of distributed systems, for example hadoop.

If you want to read more about microburst, here is a link:


Scale-Out Storage – In the Hypervisor Kernel or in a VM?

A new tech note from Nutanix discussing architectural considerations with implementing a converged, scale-­‐out storage fabric that run across a cluster of nodes. This paper focuses on high availability and resiliency for virtualizing business critical applications. The paper covers running storage services embedded in the hypervisor kernel and as virtual machine in the user space.

Scale-Out Storage – In the Hypervisor Kernel or in a VM?



When To Use: Nutanix Shadow Clones vs VCAI

First kick at the can with google hangouts. Apparently I din’t setup the webinar on air correctly. But the content made it out of alive!

Post any questions below.