NOS 4.0 – Cluster Health – Slices & Dices

    It slices and it dices! Nutanix Cluster Health is a new feature that will be another great asset in maintaining availability for your Tier 1 workloads. Cluster Health allows the ability to monitor and visually see the overall health of cluster nodes, VMs and disks from a variety of different views. With the ability to set different HA requirements at the application level, Cluster Health will visually dissect what’s important and give you guidance on how to take corrective action.

    NOS 4.0 - Cluster Health

    Multiple views to meet your needs

    Once inside the Cluster Health section in Prism you have access to over 55 tests and doesn’t require any additional setup other than upgrading to 4.0.


    NOS 4.0 – Use Powershell To Offload Tape Backup At A Remote Site

    NOS 4.0 introduced PowerShell as a third way of automating your Nutanix Infrastructure. Also apart of 4.0 was the ability to clone VM’s without having to active the whole protection domain. Pre 4.0 this would interfere with replication.

    With these two features in our back pocket we can automate getting your workloads backed up to tape with the use of Veeam or some other 3rd Party backup software.

    1) Install the Nutanix command-lets.
    Pretty easy as they wrapped up in a MSI which you can download from Prism. Installation pre-requirements – Powershell 2.0 & onwards, .Net 4.0

    2) Connect to your Remote Nutanix Cluster
    Connect-NutanixCluster -server -U -P

    2) Get the last snapshot that was replicated
    $snaps = Get-SnapshotsForPd -Name
    $snap[0] will return the last snapshot
    $snaps[0].vms.vmName will return all of the VM’s in the protection domain

    3) Restore All the VM’s for Backup
    Restore-Entities -SnapshotId $snaps[0].snapshotID -VmNames $snaps[0].vms.vmName -name -\PathPrefix \restore

    You can see the action taking place in Prism.
    Screen Shot 2014-04-15 at 4.54.03 PM

    The backup software should be able to pick them up from here. If the remote site has active workloads, you could get fancy and throttle the disk IO with shares\limits too.

    All of the cmdlets are based off the Nutanix REST-API so anything in you see in the UI you can automate.


    NOS 4.0 – When is it safe to upgrade the hypevisor?

    Prior to NOS 4.0 you had to run a the NCLI (Nutanix Command Line) if you wanted to check the cluster status if you were rebooting the CVM’s (Controller Virtual Machine) when upgrading the Hypervisor.

    Today it’s front and center.


    How many hosts you can take down before the cluster is impacted.

    By clicking on the image you get lots of information that may be affecting the fault domains of the cluster. That information includes:

    Fault Domain Type: Component

    Extent Groups – Based on placement of extent group replicas the cluster can tolerate a maximum of X node failure(s)

    Oplop – Based on the placement of oplog episodes the cluster can tolerate a maximum of X node failure(s)

    Metadata – All metadata ring partitions are fault tolerant

    Free Space – Cluster has enough free space (X TB) to tolerate X node failure(s)

    Fault Domain Type: Block

    Extent Groups – Rackable unit aware data placement is disabled – Not enough rackable units in the cluster (Need 3 blocks for availability domains to work)

    Oplog – Rackable unit aware data placement is disabled – Not enough rackable units in the cluster

    Metadata – Metadata ring partitions with nodes: X,X,X,X are not fault tolerant.

    ZooKeeper – Rackable unit aware Zookeeper placement is disabled – Not enough rackable units in the cluster
    Free Space – Rackable unit aware data placement is disabled – Not enough rackable units in the cluster


    Blue + Yellow = Green – @Nutanix and @Veeam Paint By Number Disaster Recovery

    This week I get a 2nd opportunity to present with Jason Acord from Veeam and talk about how are companies are tackling business continuity with our paint by number approach. Both infrastructure and disaster recovery come together to provide an simple yet highly powerful tier 1 standard for deliver applications end to end.

    I will be talking about:

    * Speed of Nutanix with Customer Choice
    * 55 Health Tests Nutanix provides
    * Speed Enchantments for Tier 1 Applications
    * Per VM replication
    * Offloading Tape to the Secondary Site
    * Limitless Site Recovery Topologies
    * Fast Backup with Veeam and intelligent Flash

    Screen Shot 2014-04-13 at 2.49.49 PM
    Nutanix Per VM Replication – Byte Level Replication.

    When: Wednesday April 16th 1 p.m. ET
    Sign Up Here


    What Would Tony Do? vSwitch0 on Nutanix

    Tony Holland – Nutanix Sr SE with a common sense approach to infrastructure.

    Q: What are the default vSwitch0 settings for a new install of What physical adapters are included in vSwitch0? Also, what is the default teaming policy?

    All NICS are part of vSwitch0

    Default teaming is the 2 10GbE’s are active and 2 1Gb’s are standby. I use the 1 Gb’s for setup and then after simply unplug them.

    I always create a new Port Group under vSwitch0 called Nutanix.

    I then will change the NIC teaming to have VMnic0 be active and all the others standby, and for VM Network port group move VMNIC1 to active and the others standby. Has worked well for me doing this setup.



    VMware Horizon View 6: Cloud Pods & Windows 2012R2

    Only getting accepted into the beta the evening before the NDA was lifted it’s not like I had a lot of time to see and test everything. RDSH support is getting all of the hoopula but for me I had two bucket list items that finally saw the light of day.

    The big one for me for was Horizon View Cloud Pod. I’ve been waiting a long time to see this come out because it was a problem I self inflected my former employer with in 2009. I still see lots of people wanting to stretch their View Connection Servers across sites which is a no no due to the java messaging service that needs less than 4 ms to maintain good behavior. Now you can have 4 pods, across two sites, servicing 20,000 users.

    Cloud Pod

    Cloud Pod

    The first glimpse of this came two VMworlds ago, Demystifying Large Scale Enterprise View Architecture: Illustrated with Lighthouse Case Studies with John Dodge. Active\Active DR made easy. F5 and NetScalar still have a place to play but I am not sure yet.

    You can assign a sites to your pods and users can have a home site. A home site is the affinity between a user and a Cloud Pod Architecture site. Home sites ensure that users always receive desktops from a particular datacenter, even when they are traveling. If a home site is not setup the Cloud
    Pod Architecture feature delivers the nearest available desktop in the pod federation. If all of the desktops in the local datacenter are in use, the Cloud Pod Architecture feature selects a desktop from the other datacenter.

    The 2nd great thing is support for Active Directory Domain Services domain functional levels for 2012\2012R2. You can finally install the connection server on 2012\2012R2.

    Great day for VMware View shops


    View 5.3.1 and Windows 8.1

    Some interesting info regarding using Windows 8.1 with VMware View:

    To upgrade a desktop from Windows 8 to Windows 8.1, you must uninstall View Agent, upgrade the operating system from Windows 8 to Windows 8.1, and then reinstall View Agent. Alternatively, you can perform a fresh installation of Windows 8.1 and then install View Agent.

    When you install View Agent in a Windows 8.1 virtual machine, the installer allows you to select the View Persona Management feature, but this feature is not supported for Windows 8.1 desktops. If you select the View Persona Management feature in the View Agent installer, the feature is not installed.

    Reconnecting to a Windows 8 desktop session over PCoIP can take up to 24 seconds. While waiting to reconnect, the user sees a black screen and a mouse pointer. This issue only occurs when a user reconnects to an active desktop session. It does not occur when a user logs off and logs in again.

    Info about getting the update:
    “The Update” For Windows Server 2012 R2, Windows 8.1, and Windows RT 8.1 Is GA … But Not For Everyone


    #Nutanix 1 TB HDD Rebuild in 3 min – Video

    Below shows the impact of drive rebuild when the cluster is under no load. I used the Xangati dashboard to track internode communication. Nutanix does have QofS over drive rebuilds. Important thing is that HDD’s rebuild to other HDD’s and don’t impact performance by flooding the flash tier. This data is considered cold so it shouldn’t impact the write path.

    A Nutanix cluster is limited roughly 40 MB/s per node for rebuilding HDD’s. Nutanix can achieve linear rebuild times because the data is evenly spread-out through the cluster.More info here.

    * The HHD being rebuilt is 1 TB. It had 24 GB of data sitting on it.
    * The HDD was removed using the UI
    * The 3 min time is for the rebuild. NOS will preform another map-reduce job before to verify everything is rebuilt before releasing the drive. The secondary map-reduce job is considered low priority because the data in this case still exists.
    * To add the HDD back into the cluster you have to edit-zeus to remove the tombstone record as this isn’t considered a normal procedure.


    VMware Horizon Special Online Event : vGPU announced, what is left?

    Pat Gelsinger, CEO, Sanjay Poonen, EVP and GM End-User Computing, and Sumit Dhawan, VP and GM of Desktop Products, deliver some news about how VMware is transforming desktops and applications. I look forward to hearing the announcement and continue seeing VMware and Citrix play leap frog over each other. I couldn’t imagine the desktop business without either one these power houses.


    Source: http://seekingalpha.com/

    Good Read: Why Nvidia’s Partnership With VMware Is A Big Deal


    Binny Gill on your Data Centers Silent Killer

    Binny Gill, is a Director of Engineering at Nutanix. Truth is I know he was promoted but I don’t remember his new fancy title :-)

    Hard drive disk toning in blue color close-upI had to chance to talk with him about storage as whole from the enterprise perspective. He mentioned how important it was to protect the data above anything else, resiliency over performance. Safe to say I think Nutanix has done that and performance will continue to climb with existing gear with our NOS 4.0 release. (Note the numbers are impressive now)

    I did want to share something from Binny if your thinking of storing any amount data:

    We are far superior than traditional filesystems because we are paranoid about silent corruptions and bitrot.

    We keep checksum separate from data for all data and compare the checksum before returning data. Others use 520 byte sectors or more (528 bytes in IBM drives).. but those things are not commodity and not broadly applicable. When backup is taken, the checksum is compared and we are sure that there is no spreading of bitrotted data to backups.

    We have disk scrubbing that makes sure that any bit flips are detected and repaired.

    Our metadata is also self-checksumming and hence protected from bitrot.

    Simply stated, we will never return corrupted data even if the drives are faulty. At best we will return no data.

    You can go to 6 TB to 6 PB+ with Nutanix but it would be all for not if we didn’t manage your data like your retirement fund.

    Additional Resource:
    A great article on bit rot.