Today Nutanix publicly launched our new operating system version 3.0 and new hardware. While this a big release for us I want to focus on disaster recovery and more importantly how to get there without blowing the hinges off the bank. More information on NX-3000 Series Hardware and NOS 3.0 can be found here.
Having a disaster recovery plan for most people is like being healthy. You know you should be healthy but you can keep kicking it to the curb because it’s not easy, and both cost more money than the alternatives. We usually only start eating healthy after something bad has happened to us or a loved one. The same is true for IT departments; no one wants to pay for a duplicate datacenter due to cost, staffing and the additional complexity.
I think it’s smart to start small when building your DR plan out. There is lots of people and process involved and making sure everyone understands all of the risks and costs involved. Most companies don’t need DR for all their applications but usually the Exec Team can agree on the top 3 or 4. The same token you don’t want buy gear that can’t scale and you end up having a lot of toys instead of the performance that is needed to run a application as if it was running it’s original home.
One of Nutanix’s key strengths is its ability to scale linear. Scalability is why Nutanix is a slam dunk for VDI proof of concepts. You don’t have to through away your test gear, you can just add more nodes when needed. The same is true for DR. Most companies will approve a few applications to get the green light for DR. Over time as budget allows you can get all of your applications covered. It wouldn’t make a lot of sense spending all the money for a totally new datacenter and then wait months or years sorting out all of the inter-dependencies.
Another key factor where Nutanix shines is time to value. Once racked, 30 mins or less you have your DR site up and running plus any networking you might have to do to create the VPN tunnels. This is very important since you probably are not going to get more staff for your second datacenter. Like the Googles’s and Amazons of the world you are going to have start increasing your VM to admin ratio in order be relevant in your own datacenter .
While the next section will highlight the new Nutanix Native replication I like to point out 3rd Party products are great option for replicating your data off existing storage array onto a Nutanix box. You always have to pay for replication; you might as well use a 3rd party tool if your existing storage array is expecting to replicate a whole LUN\Volume instead of the VM’s you want. Zerto, Veeam and vRanger are all solid choices. You can use vSphere replication for free but it doesn’t do a great job of distributing the load from what I can tell thus far.
Best in class disaster recovery: Highly flexible VM-aware DR with unprecedented master-master (multi-way) replication configuration possibilities, never seen in traditional arrays before. In the pre-virtualization era, arrays would replicate entire volumes (LUNs or file systems) one-way, and would punt the run-book automation to the host. With Nutanix, virtualization administrators can select VMs that form protection domains and consistency groups; then replicate the protection domain to multiple destinations. Nutanix also delivers basic run-book automation without the need for any hypervisor assists. Nutanix replication will handle the failover and failback including the registration of VM’s in the sites vCenter. VMware SRM today is only supported via the vSphere Replication but plans are to support SRM in a feature release.
As of the 3.0 release, replication is only async but they’re are plans eventually to move to synchronous. Like the Nutanix File System, replication is also distributed across the cluster. You don’t have few storage controllers like traditional architecture causing a bottleneck during replication operations. High performance by all nodes simultaneous replicating in the cluster. There is an option if you must proxy the traffic. All replication is deduped at the source and will not send redundant blocks across the network.
Replication also can roll up the snapshots based on the schedule you set.
As example you could set the following policy.
● last 6 hourly snapshots
● last 7 daily snapshots
● last 4 weekly snapshots
● last 3 monthly snapshots
● last 1 yearly snapshot
Remember eventually a disaster will happen. It doesn’t have to a natural disaster and in most cases it will be a human error that causes the DR event.