Binny Gill, is a Director of Engineering at Nutanix. Truth is I know he was promoted but I don’t remember his new fancy title 🙂
I had to chance to talk with him about storage as whole from the enterprise perspective. He mentioned how important it was to protect the data above anything else, resiliency over performance. Safe to say I think Nutanix has done that and performance will continue to climb with existing gear with our NOS 4.0 release. (Note the numbers are impressive now)
I did want to share something from Binny if your thinking of storing any amount data:
We are far superior than traditional filesystems because we are paranoid about silent corruptions and bitrot.
We keep checksum separate from data for all data and compare the checksum before returning data. Others use 520 byte sectors or more (528 bytes in IBM drives).. but those things are not commodity and not broadly applicable. When backup is taken, the checksum is compared and we are sure that there is no spreading of bitrotted data to backups.
We have disk scrubbing that makes sure that any bit flips are detected and repaired.
Our metadata is also self-checksumming and hence protected from bitrot.
Simply stated, we will never return corrupted data even if the drives are faulty. At best we will return no data.
You can go to 6 TB to 6 PB+ with Nutanix but it would be all for not if we didn’t manage your data like your retirement fund.
[UPDATE: For DRAM, we use x4 dies, we use Chipkill which is a much superior algorithm than vanilla ECC. We can lose an entire die and still be able to correct the data.]
A great article on bit rot.