Stargate: A Five Star Chef

I have previously have discussed Medusa and Curator as two of the components that allow the Nutanix Complete Cluster to scale before being hired at Nutanix. After a week and half of technical training I can see why I didn’t even attempt at trying to explain Stargate until now. In training the Stargate section took over two and half hours to nail now it’s inner workings.

Since Nutainx is a distributed system I think it help make sense if I can explain that 1 Nutanix Block has 4 high performance nodes that is wrapped in sheet metal. The only value that sheet metal provides is redundant power for the 4 nodes. While you need at least 3 nodes to get started you can grow your system 1 node at a time. There is no limit to how large your Nutanix Cluster can grow. Each node consists of two X86 processors, 1 PCI-SSD card, 1 SSD SATA HDD and 5 1TB SATA HDD. Stargate, Medusa and Curator all help to form the foundation of the controller VM that lives on each node.

Stargate is the Five Star Chef of the Nutanix Distributed File System. Stargate is responsible for receiving and processing all the data that is received. Stargate delivers a first class product, servicing its customers the fastest way possible without sacrificing quality and the integrity of the final product. Like any good Chef, you must taste test while you’re cooking. Stargate performs these taste tasting by performing checksums before writing and doing periodic checks.

Stargate is smart enough to realize that she can’t do it all by herself. She relays on Medusa to keep track of where all the pots and pans, utensils and ingredients get placed (data) in the course of the day. With Medusa’s help, Stargate can quickly read the data from multiple devices (PCI-SSD, SSD SATA, 1TB SATA HDD). Stargate never wants to mess around so all the writes happen to PCI-SSD for best performance.

On the other side of the coin if Stargate has to keep reading the same data over a short period of time it will keep it in memory. This memory is referred to as the Extent Cache.

Since Stargate is busy working away in the kitchen it can get messy after awhile. With all the writes happening to the PCI-SDD and Stargate pulling data up off the capacity tier to the performance tier, its up to Curator to provide quality control. Curator periodically scans the metadata database and identifies cleanup and optimization tasks that Stargate should perform. One such task is moving cold data back to the capacity tier. It’s important to note that Stargate is still the only one doing the work; Curator can only strongly suggest what to do. It’s also worth noting that data movement between the tiers is very granular. Data is moved in 16 MB chunks between tiers. The small size helps to ensure only the hottest pieces of data are persisting on PCI-SSD.

If there is fire in the kitchen(node failure)Curator and Stargate will respond to two issues that may arise. First, when a guest VM begins reading across the network, Stargate will begin migrating those data to the new host. This increases performance for the guest VM. Second, Curator will notice that there is a missing data and instruct Stargate to begin creating a second copy.

If Stargate has problems and the node is fine, any guest VM using the local storage path will get redirected to another controller VM. Guest VM’s on the problematic node will only use the local storage path once all problems are resolved by Stargate.

I hope this gives some insight into the distributed nature of Nutanix.

Thanks for reading,

Speak Your Mind