Delete AFS Forcefully

    There can be instances when graceful removal of file server does not work and you may see following errors, this can happen when the file server is not available and has been deleted without following the right process. Sometimes the FSVMs get deleted instead of using the delete workflow in file server section of Prism.

    ncli fs delete uuid=________________________
    Error: File server with uuid xxxxx-xxxxx-xxxxx is offline and unable to fetch file server VMs.

    Use the following command to delete the file server permanently from the Minerva Database. This is run from any CVM.
    minerva –fs_uuids _________________ force_fileserver_delete

    file server UUID can be obtained from ncli fs ls command.


    Powering Off and Starting Up AFS – Native File Services on Nutanix

    There really isn’t a need to shut down AFS but moves and maintenance are a part of life. Here are the steps for a clean shutdown….


    Shutting Down:
    • Power off all guest VMs on the cluster, leaving only FSVM’s and CVM’s powered on.
    • From any CVM run: minerva -a stop
    • The stop command will stop AFS services and power off the FSVMs for all File Servers
    • Once only the CVM’s remain powered on, run Cluster Stop from any CVM.
    • Power Off the CVM’s and Hosts

    Starting Up:
    • Power on Hosts
    • CVMs will auto-start once the Host is up
    • Once all CVMs are up, run Cluster Start to initiate cluster services
    • Verify all services are up with Cluster Status
    • From any CVM run: minerva -a start
    • The start command will power on the FSVMs for all File Servers and start AFS services
    • Power on all remaining guest VMs



    Will DR and Backup power AHV sales to 50%?

    .Next has come and gone like your favorite holiday. Tons of hustle and bustle with great euphoric feelings followed by hitting a wall and being extremely tired. The Nutanix conference was chalked filled with announcements but the most powering to me were the ones related to DR and Backup. AHV being built with Cloud in mind is surging but adoption has been slowed by 3rd Party backup support. You can have this great automated hypevisor with best in class management but if you can’t back it up easily it will curb adoption.

    This number will grow rapidly now with all of the backup and DR

    This number will grow rapidly now with all of the backup and DR options

    So before .Next 2017 DR and Backup Options for AHV included:
    Commvault with support for IntelliSnapp
    • Time Stream – Native DR with Time Stream
    o 1 node backup with the NX-1155
    o Backup/DR to Storage Only Clusters
    o Cloud Connect to AWS and Azure
    o DR/Backup to a full cluster.
    • Any backup software with agents

    After .Next 2017 announcements for AHV Backup and DR support include:
    HYCU from Comtrade – Rapidly deployed software using turn-key appliances. Great choice if you have some existing hardware that you can use or place onto Nutanix. Point, click, done. Check out more here.
    Rubrik – A hardware based appliances that do the heavy lifting for you. Check out more here.
    Veeam – Probably best known for making backup easy on ESXi have announced support for AHV later this year. Nutanix added Veeam as a Strategic Technology Partner within the Nutanix Elevate Alliance Partner Program. Going Green!
    Druva – Nutanix users can now take full advantage of Druva Phoenix’s unique cloud-first approach, with centralized data management and security. Only ESXi today and agents with AHV but agentless support is coming. More here.
    • Backup and DR to full Nutanix Clusters get near-sync to achieve very low RPO. Read more on near-sync here.
    Xi Cloud Services, a native cloud extension to the Nutanix Enterprise Cloud Platform that powers more than 6000 end-customers around the globe. This announcement marks another significant step towards the realization of our Enterprise Cloud vision – delivering a true cloud experience for any application, in any deployment model, using an open platform approach. For the first time, Nutanix software will be able to be consumed as a cloud service.

    Maybe 50% for AHV is a lofty goal but I can see 40% by next year for new sales as people focus on their business rather than the day to day to headaches. With a very strong backing in backup and DR AHV growth will flourish.


    The Down Low on Near-Sync On Nutanix

    Nutanix refers to its current implementation of redirect-on-write snapshots as vDisk based snapshots. Nutanix has continued to improve on its implementation of snapshots by adding in Light-Weight Snapshots (LWS) to provide near-sync replication. LWS uses markers instead of creating full snapshots for RPO 15 minutes and under. LWS further reduce overhead with managing metadata and remove overhead associated high number of frequent caused by long snapshot chains. The administrator doesn’t have to worry about setting a policy between using vDisk snapshots or LWS. Acropolis Operating System (AOS) will transition between the two forms of replication based on the RPO and available bandwidth. If the network can’t handle the low RPO replication will transition out of near-sync. When the network is OK again to meet the near-sync requirements AOS will start using LWS again. In over-subscribed networks, near-sync can provide almost the same level protection a synchronous replication without impacting the running workload.

    The administrator only need to set the RPO, no knowledge of near-sync is needed.

    The administrator only need to set the RPO, no knowledge of near-sync is needed.

    The tradeoff is that all changes are handled in SSD when near-sync is enabled. Due to this trade off Nutanix reserves a percentage of SSD space to be used by LWS when it’s enabled.


    In the above diagram, first a vDisk based snapshot is taken and replicated to the remote site. Once the fully replication is complete, LWS will begin at the set schedule. If there is no remote site setup LWS will happen locally right way. If you have the bandwidth available life is good but that’s not always the case in the real world. If you miss your RPO target repeatedly it will automatically transition back to vDisk based snapshots. Once vDisk based snapshots meets occurs fast enough it will automatically transition back to near-sync. Both transitioning out and into near-sync is controlled by advanced settings called gflags.
    One the destination side AOS creates hydration points. Hydration points is a way for the LWS to transition into a vDisk based snapshot. The process for inline hydration is to:

    1. Create a staging area for each VM (CG) that’s protected by the production domain
    2. The staging area is essentially a directory with a set of vdisks for the VM.
    3. Afterwards, any new incoming LWS’s will be applied to the same set of vdisks.
    4. And the staging area can be snapshotted from time to time and then you would have individual vdisk-backed snapshots.

    The source side doesn’t need to hydrate as a vDisk based snapshot is taken every hour.

    Have questions? Please leave a comment.


    ROBO Deployments & Operations Best Practices on Nutanix

    The Nutanix platform’s self-healing design reduces operational and support costs, such as unnecessary site visits and overtime. With Nutanix, you can proactively schedule projects and site visits on a regular cadence, rather than working around emergencies. Prism, our end-to-end infrastructure management tool, streamlines remote cluster operations via one-click upgrades, while also providing simple orchestration for multiple cluster upgrades. Following the best practices in this new document ensures that your business services are quickly restored in the event of a disaster. The Nutanix Enterprise Cloud Platform makes deploying and operating remote and branch offices as easy as deploying to the public cloud, but with control and security on your own terms.

    One section I would like to call out in the doc is how to seed your customer data if your dealing with poor WAN links.

    Seed Procedure

    The following procedure lets you use seed cluster (SC) storage capacity to bypass the network replication step. In the course of this procedure, the administrator stores a snapshot of the VMs on the SC while it’s installed in the ROBO site, then physically ships it to the main datacenter.

    Install and configure application VMs on a ROBO cluster.
    Create a protection domain (PD) called PD1 on the ROBO cluster for the VMs and volume groups.
    Create an out-of-band snapshot S1 for the PD on ROBO with no expiration.
    Create an empty PD called PD1 (same name used in step 2) on the SC.
    Deactivate PD1 on the SC.
    Create remote sites on the ROBO cluster and the SC.
    Retrieve snapshot S1 from the ROBO cluster to the SC (via Prism on the SC).
    Ship the SC to the datacenter.
    ReIP the SC.
    Create remote sites on the SC cluster and on the datacenter main cluster (DC1).
    Create PD1 (same name used in steps 2 and 4) on DC1.
    Deactivate PD1 on DC1.
    Retrieve S1 from the SC to DC1 (via Prism on DC1). Prism generates an alert here, but though it appears to be a full data replication, the SC transferred metadata information only.
    Create remote sites on DC1 and the ROBO cluster.
    Set up a replication schedule for PD1 on the ROBO cluster in Prism.
    Once the first scheduled replication is successful, you can delete snapshot S1 to reclaim space.

    To get all of the best practices please download the full document here, https://portal.nutanix.com/#/page/solutions/details?targetId=BP-2083-ROBO-Deployment:BP-2083-ROBO-Deployment


    HYCU for You: Icing on the cake for AHV


    HYCU is a purpose-built application data protection solution for Nutanix. HYCU is coming out of the gate with support for AHV and some key value propositions in mind:
    a. 100% Application-focus
    b. Backup to NAS &/or Cloud
    c. Built to be hypervisor-agnostic. Today it uses changed region tracking API’s available from AOS. Over time HYCU will use those same API’s for other hypervisors.
    d. Recover in <2 minutes, deploy in <3 minutes, and learn in <4 minutes. HYCU is developed by Comtrade Software, a Boston-based company. They also develop monitoring solutions like SCOM management packs and Microsoft OMS solutions for Nutanix. Comtrade has really became a part of Nutanix during the development phase. The slack channel between the two companies was great to track progress and not to mention software that meet its release date ahead of schedule! Pick Your Backup Destination?

    HYCU provides classic backup and restore through simple and intuitive workflows. You can pickup from a variety of targets to store your data.
    • Backup data within datacenter and/or to the cloud
    o Nutanix storage
    o Third party storage – If you got it, use it.
    • Cloud storage – Efficient backup to AWS and Azure that doesnot require cloud-based VM. In most cases the VM running is more costly than the storage so this is a great feature.

    Other uses cases
    • Application discovery
    o Compliance
    • Enabling self-service for VM & App/DB Administrators
    o Power to protect against impact of patches / upgrades
    o Protects SQL out of the box
    o Rapid, context sensitive restores
    • Restore to alternative location for test / debug / reporting / verification
    • Full automation / orchestration through REST API integration

    I believe what Veeam did for VMware early can happen again with HYCU for Nutanix. As more and more backup options hit the market for AHV it will be interesting to follow this. If you want to take a spin for your Nutanix CE cluster, sign up here: https://www.comtradesoftware.com/free-trial/


    Rubrik and AHV: Say No to Proxies

    The last couple of years I am a huge fan of backup software that removes the need for having proxies. Rubrik provides a proxy-less backup solution by using the Nutanix Data Services Virtual IP address to talk directory to each individual virtual disk that it needs to back up.
    Rubrik and Nutanix have some key advantages with this solution:
    • AOS 5.1+ with version 3 API’s provides change region tracking for quick efficient backup with no hypervisor based snap. This allow for quick and efficient snapshots.
    • With AHV and data locality, Rubrik can grab the most recently changed data without flooding the network which can happen when the copy and VM might not live on the same host. For Nutanix the reads happen locally.
    • Rubrik has access to ever virtual disk by making an iSCSI connection to bypass the need of proxies.
    • AOS can redirect the 2nd RF copy away from a node with it’s advanced data placement if the backup load becomes too great during a backup window. Thus protecting your mission critical apps that running 24-7.
    • Did I mention no proxies? 🙂

    Stop by the Rubrik booth and catch their session if your at .Next this week.


    Backing Up AFS Home Shares with Commvault

    You cannot back up an Acropolis File Services (AFS) home shares with CommVault software until you change a setting on AFS. You need to let Commvault have access to the home share without the use of reparse ponts. A home share is the repository for the user’s personal files and is distributes the top-level directories across all of the file server VMs for performance and ease of management. The home share contains reparse point attributes in its top level directories to help with referrals. Since CommVault automatically skips these directories for backup because of the reparse points we make the below change.

    AFS can disable reparse points for registered client(s) and reparse points is enabled for other clients which are not registered. I would list all of your proxies and media agents with this command.

    Run this command on any file server VM

    scli smbcli set –section=global –param=”backup hosts” –value=”″


    Acropolis Container Services on #Nutanix

    This is the first release of a turnkey solution for deploying Docker containers in a Nutanix cluster. Instead of swiping your credit card for AWS EC2 you can deploy your containers through the built in Self Service Portal. Now it’s not all totally new because Nutanix previously released a volume plug-in for Docker. What is new is:

    * The Acropolis Container Services(ACS) provisions multiple VMs as container machines to run Docker
    containers on them.
    * Containers are deployed apart of projects. In projects, users can deploy VM’s or containers. You can assign quotas to the projects over, storage, CPU and memory.
    * ACS can use the public Docker registry is provided by default, but if you have a separate Docker registry you
    want to use, configure access to that registry as well.
    * One-Click upgrades for the Container machines.
    * Basic monitoring with a containers view in the self-service portal allows you to view summary information about containers connected to this portal and access detailed information about each container.


      Moby Project Summit Notes

      The Moby Project was born out of the containerd / Docker Internals Summit

      For components to be successful they need to be successful everywhere. which lead into SwarmKit being mentioned as not being successful because no other ecosystem was using it. Seems to be a strong commitment to make everything into a component out in the open.

      Docker wants to be seen as a open source leader thru doing the hard work to support components.

      All open-source development will be under the Moby project.

      Upstream = components
      Moby = Staging area for products to move on like containerd is in the CNF project.
      – Heart of open-source activities, a place to integrate components
      – Docker remains docker
      – Docker is built with Moby
      – You use Moby to build things like Docker
      – Solomon mentions “1000 of smart people could disagree on what to do”, Docker represents it’s opinion. It’s a lot easier to agree on low level functions because there is few ways to do them.
      – Moby will end up as go libraries in Docker but that will go away.

      Moby is connected to Docker but it’s not Docker. Name inspired from the Fedora project.

      Moby is a trade off to get it out in the open early versus completeness

      GitHub should be used a support forum.

      InfraKit is a toolkit for creating and managing declarative, self-healing infrastructure. It breaks infrastructure automation down into simple, pluggable components. These components work together to actively ensure the infrastructure state matches the user’s specifications. Although InfraKit emphasizes primitives for building self-healing infrastructure, it also can be used passively like conventional tools

      LinuxKit, a toolkit for building custom minimal, immutable Linux distributions.

      – Secure defaults without compromising usability
      – Everything is replaceable and customisable
      – Immutable infrastructure applied to building Linux distributions
      – Completely stateless, but persistent storage can be attached
      – Easy tooling, with easy iteration
      – Built with containers, for running containers
      – Designed for building and running clustered applications, including but not limited to container orchestration such as Docker or Kubernetes
      – Designed from the experience of building Docker Editions, but redesigned as a general-purpose toolkit

      No master plans to change away for go.

      Breaking out the monolithic engine API will mostly likley done with gRPC. gRPC is a modern open source high performance RPC framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication. It is also applicable in last mile of distributed computing to connect devices, mobile applications and browsers to backend services.

      SwarmKit Update
      SwarmKit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.

      New Features

      – Topology-Aware Scheduling
      – Secrets
      – Service Rollbacks
      – Service Logs
      – HA scheduling
      – Encrypted Raft Store
      – Health-Aware Orchestration
      – Synchronous CLI
      What is Next?
      – Direct integration of containerd into SwarmKit by passes the need for Docker Engine
      – Config Management to attach configuration to services
      – Swarm Events to watch for state changes and gRPC Watch API
      – Create a generic runtime to support new run times without changing SwarmKit
      – Instrumentation

      LibNetwork Update
      – Quality More visibility, motioning and troubleshooting.
      – Local-scoped network plugins in Swarm-mode
      – Integration with containerd