Consistent Performance and Availability with Node Based Architecture


Everybody wants to talk about performance because it’s the sexy topic but it’s not the only decision point. You can have all the performance in the world but if you’re not highly availability it doesn’t really matter. If performance was the only factor we would all just run PCI based flash in a server and call it a day. With traditional storage, active / passive architectures are one decision to ensure performance. Yes they are wasting available performance most of the time but that doesn’t make it wrong, that was an architectural choice. In node based architecture consistent performance and availability have to put above drag racing numbers. In node based architecture the luckily hood of a node going down is simply mathematical higher. This is why Nutanix designed around distributing everything up front. It’s the ability to fail hard and fail fast and live to see the next failure. (Side note: This also why people talking about 64 node all flash clusters with a fault tolerance of 1 makes me chuckle)

Some design decision points around Nutanix:

* Nutanix decided to always write the minimum required two copies of data. Lots of other node based architecture will only write 1 copy if destination node for the remote copy is down or being upgraded. The trade off with Nutanix always auto-leveling and spreading the load probably cost more in terms of CPU but the performance is consistent and available. Big bulk sync operations don’t take place. You don’t have to manually migrate data around the country side.

* Hot Data is the flash tier always has two copies in-case a node goes you don’t have to warm up the flash tier. Trade off is with space but inline compression can help in this area. Consistency was chosen over performance.

* Secondary copies are not sent to static nodes. Spreading the load give better consistent performance and better rebuild times. Nutanix also chose to spread data at the vdisk level versus the VM.

* Data Locality – The local copy of data helps with network congestion and fast read performance.

* Automatic DRS is fully supported. Maintenance operations are going to happen. You don’t have to figure out the home node of the VM.

* Nutanix rebuilds active data to the SSD tier and builds cold data to the HDD tier. Active data is quickly rebuilt and cold data is not impacting the performance during a rebuild.

* Any VM can get access to all of the SSD resources in the cluster is the local SSD tier is full. We have some CPU cost in managing this with Apache Cassandra but it’s highly optimized. The benefit the working set can be larger than the flash\controllers of two nodes. Performance is not tied to Dedupe or compression yielding large results.

* Self healing – As along as you have the capacity and enough nodes you can continually heal your self. Example having a 8 node cluster, lose a node, heal, lose a node, heal, lose a node, heal until you get down to three nodes. This one reason why the 6035c storage KVM node with the ability to be attached to ESXi and Hyper-V clusters is just awesome.

* Access to all of the local resources. We allow multiple SSD and HDD to live in one storage pool. If data is going from SSD to HDD you have access to all 20 HDDs even though the host may have multiple SSD’s. Also the down tiering is not affected by a RAID write penalty.

* HCL is worry free. – With pure software you have to worry about the hypervisor and then the manufacturer to see what both are going to support. Both sides can change and then you can be left scratching your head on what to do next inside of fixing whatever the real problem is. So while you might not see NVMe supported day 1, you will have a highly available system with the components combat tested.

* Checksums – Every-time baby, no exceptions. Consistency is always ensured.

* Scale – Nutanix always operates the same way so you know what your getting which leads to consistency as you scale. We don’t flip any bits after X nodes that change resource consumption which may affect performance.

They’re always trade offs to considers with your workloads. Management of secondary copies in node based architecture is extremely important and in my opinion should take precedence over performance.

Speak Your Mind