Jun
30

The Down Low on Near-Sync On Nutanix

Nutanix refers to its current implementation of redirect-on-write snapshots as vDisk based snapshots. Nutanix has continued to improve on its implementation of snapshots by adding in Light-Weight Snapshots (LWS) to provide near-sync replication. LWS uses markers instead of creating full snapshots for RPO 15 minutes and under. LWS further reduce overhead with managing metadata and remove overhead associated high number of frequent caused by long snapshot chains. The administrator doesn’t have to worry about setting a policy between using vDisk snapshots or LWS. Acropolis Operating System (AOS) will transition between the two forms of replication based on the RPO and available bandwidth. If the network can’t handle the low RPO replication will transition out of near-sync. When the network is OK again to meet the near-sync requirements AOS will start using LWS again. In over-subscribed networks, near-sync can provide almost the same level protection a synchronous replication without impacting the running workload.

The administrator only need to set the RPO, no knowledge of near-sync is needed.

The administrator only need to set the RPO, no knowledge of near-sync is needed.

The tradeoff is that all changes are handled in SSD when near-sync is enabled. Due to this trade off Nutanix reserves a percentage of SSD space to be used by LWS when it’s enabled.

near-sync

In the above diagram, first a vDisk based snapshot is taken and replicated to the remote site. Once the fully replication is complete, LWS will begin at the set schedule. If there is no remote site setup LWS will happen locally right way. If you have the bandwidth available life is good but that’s not always the case in the real world. If you miss your RPO target repeatedly it will automatically transition back to vDisk based snapshots. Once vDisk based snapshots meets occurs fast enough it will automatically transition back to near-sync. Both transitioning out and into near-sync is controlled by advanced settings called gflags.
One the destination side AOS creates hydration points. Hydration points is a way for the LWS to transition into a vDisk based snapshot. The process for inline hydration is to:

1. Create a staging area for each VM (CG) that’s protected by the production domain
2. The staging area is essentially a directory with a set of vdisks for the VM.
3. Afterwards, any new incoming LWS’s will be applied to the same set of vdisks.
4. And the staging area can be snapshotted from time to time and then you would have individual vdisk-backed snapshots.

The source side doesn’t need to hydrate as a vDisk based snapshot is taken every hour.

Have questions? Please leave a comment.

Speak Your Mind

*