Nutanix 4.5 – DR and Backup For The Fitness Nut and New Features


    DR practices in environments that aren’t heavily regulated can be a lot like eating right. You know you should eat more fruits and vegetables but lots of times people pick the easiest or cheapest option. Usually that option is to do nothing until something really bad happens. With the release of the Acropolis Operating System 4.5 continues with greater flexibility and ease of use for SMB up to the largest of enterprises. Disaster recovery (DR) support has been verified for up to 50 VMs per protection domain(grouping of VM with a common RPO) and the following replication topologies: 1-to-1, many-to-1, and bidirectional. You can have as many production domains as you want to get around the hurdles of managing LUNS on both sides.

    New in 4.5 for DR
    Bandwidth Limit on Schedule
    Max bandwidth is set to throttle traffic between sites when no network device can limit replication traffic. Max bandwidth does not imply a maximum observed throughput. Max bandwidth allows for different settings throughout the day. Assign a max bandwidth policy when your sites are busy with production data. On days that are not busy you can deselect them which make them turn white.

    This setting is in MB/s not Mb/s, this is important to note when talking with your networking teams.

    Cloud Connect for Azure and AWS
    The cloud connect feature for Azure enables you to back up and restore copies of virtual machines and files to and from an on-premise cluster and a Nutanix Controller VM located on the Microsoft Azure cloud. Once configured through the Prism web console, the remote site cluster is managed and monitored through the Data Protection dashboard like any other remote site you have created and configured. This feature is currently supported for ESXi hypervisor environments only. Cloud Connect for AWS is know supported on all supported hypervisors.

    NX-6035C Clusters Usable as a Target for Replication
    You can use a Nutanix NX-6035C cluster as a target for Nutanix native replication and snapshots, created by source Nutanix clusters in your environment. You can configure the NX-6035C as a target for snapshots, set a longer retention policy than on the source cluster (for example), and restore snapshots to the source cluster as needed. Low cost per TB and combined with Erasure Coding (EC-X) that is now GA makes a pretty compelling solution. Keep in mind EC-X is complementary to deduplication and compression.

    The source cluster hypervisor environment can be AHV, Hyper-V, or ESXi. We are starting to see service providers provide DR and backup for their customers and this will go to help them in a long way. One Canadian Nutanix backed service provider is CloudSFY.

    Network Mapping
    settings2Network mapping allows you to control network configuration for the VMs when they are started on the remote site. This feature enables you to specify network mapping between the source cluster and the destination cluster. The remote site wizard includes an option to create one or more network mappings and allows you to select source and destination network from the drop-down list. You can also modify or remove network mappings as part of modifying the remote sites. Today it’s just for AHV <-> AHV movement of VM but you can tell by the picture of where this is headed.

    Prism Central Can Now Be Deployed on the Acropolis Hypervisor (AHV)
    Nutanix has introduced a Prism Central OVA which can be deployed on an AHV cluster by leveraging Image Service features. Great for managing all of your remote sites.

    CommVault IntelliSnap Integration
    Cluster level API integration with CommVault allowing the backup tool to directly manage the storage tier for snapshots and backups. This give Nutanix customers the ability to ensure the entire datacenter is backup managed by a single tool; in Nutanix and non-Nutanix deployments using CommVault.

    File Level Restore in Tech Preview
    The file level restore feature allows a virtual machine user to restore a file within a virtual machine from the Nutanix protected snapshot with minimal Nutanix administrator intervention.

    Behind the scenes the whole Engineering team at Nutanix has done a ton of work around snapshots and performance as well. Their work won’t show up anywhere glamorous but rest assured a lot of testing as went into the 4.5. Testing high transnational workloads passed with flying colours.

    VMware SRM was already supported from earlier versions.


    McAfee MOVE For The Acropolis Hypervisor

    McAfee MOVE AntiVirus Multi-Platform lives up to it’s name. Move MP is hypervisor agnostic so it fits perfectly into what Nutanix is doing from a VM mobility standpoint. If you want to do DR or migrate VM’s between different hyperviors you don’t have to worry about your AV solution getting in the away as-long as your VM’s have access to the Offload Scan Servers. This solution removes the need to install an anti-virus application on every VM, and it is the original agent-based deployment option.

    The solution has been around for awhile and has been steadily approved upon. I actually run this years ago for a VDI View environment and back then a limitation was that it couldn’t do on-demand scanning which is no longer a limitation. The targeted on-demand scan feature allows the administrator to select a system or a group of systems from the System Tree in McAfee ePO and assign a client task to initiate the on-demand scan immediately. Pretty handy if you just want to target one area if you’re checking for something specifically related to a threat.

    The Multi-Platform deployment option offloads all scanning to a dedicated VM — an offload scan server — that runs VirusScan Enterprise software. Guest VMs are no longer required to run anti-virus
    software locally, which improves performance for anti-virus scanning, and increases VM density per hypervisor.


    The Multi-Platform deployment option:
    • Uses McAfee ePO to manage the MOVE configuration on the client systems, offload scan server, and SVA Manager (OSS Manager).
    • Leverages the McAfee Agent for policy and event handling.
    • Uses McAfee ePO for reports on viruses that are discovered on the VMs

    RAM disk for scanning

    RAM disk is used by the OSS for file scanning and it significantly reduces the disk I/O on the offload scan server. You can enable the RAM disk option in the ePolicy Orchestrator server. RAM disk is created by the OSS and it improves the OSS performance by enhancing the scan time. This way you can save your flash on Nutanix for your workloads.

    If you’re looking for an AV option for AHV, McAfee Move Multi-Platform is an option you should look at.


    Make Hadoop More Resilient and Space Efficient with HDP and Nutanix

    Hadoop 2.0 – Storage Consumption

    With the Hortonworks Data Platform on Nutanix solution you have the flexibility to start small with a single block and scale up incrementally a node at a time. This provides the best of both worlds–the ability to start small and grow to massive scale without any impact to performance.

    The below diagram shows a typical workflow when a client starts a job that is using MapReduce. We want to focus on what happens when a DataNode writes to disk.


    Hadoop 2.0 Workflow
    1. Client submits a job
    2. Response with ApplicationID
    3. Containers Launch Context
    4. Start ApplicationMaster
    5. Get Capabilities
    6. Request / Receive Containers
    7. Container Launch Requests
    8. Data being written

    On step 8 from Figure 9, Node 1 it’s writing to the local disk and creating local copies. By default DFS replication is set to 3. That means for every piece of data that is created, 3 copies of data is created. The 1st copy is stored on the local node (A1), the 2nd copy of data will try to be placed off rack if possible and the 3rd copy will be placed in the same rack as the 2nd copy randomly. This is done for data availability and allows multiple nodes to use the copies of data to parallelize their efforts to get fast results. When new jobs are ran, NodeManagers will be selected where the data resides to reduce network congestions and increase performance. RF3 with Hadoop will have overhead of 3X.

    Hadoop 2.0 on Nutanix- Storage Consumption

    Both Hadoop and Nutanix have similar architectures around data locality and using replication factor for availability and throughput. This section will give a good idea on the impacts of changing replication factor on HDFS and ADSF.

    Test & Development Environments

    For Test and development environments HDFS replication factor can be set to 1. Since the requirement for performance will be less you can drop the value and save on storage consumption. With Acropolis Replication Factor set to 2, availability will be handled by ADSF.

    Hadoop on ADFS Parameters for Test/Dev

    Item ——————– Detail ———————————- Rationale
    HDFS Replication Factor (RF) ——————– 1——————– Performance isn’t as important
    ———————————————————————— Data Availability handled by Nutanix
    Acropolis Replication Factor (RF) —————- 2 ——————- Data availability


    In the above diagram once the local data node writes A1, ADFS will be create B1 locally and will create the 2nd copy based on Availability domains from Nutanix. Since the Hadoop DataNodes will only have knowledge of A1 copy you can use Acropolis High Availability (HA) to quickly restart your NameNode in the event of a failure. With using this configuration the HDFS / ADFS solution will have an overhead of 2X.

    Production Environments

    In production environments a minimum of HDFS RF 2 should be used so the NameNode has multiple options to place containers for YARN to work with local data. RF2 on HDFS also helps with job reliability if a physical node or VM goes down due to error or maintenance. The YARN jobs can quickly restart using the built in mechanisms by using the below recommendations and have enterprise class data availability with ADSF.

    Hadoop on ADFS Parameters for Production

    Item ———————————————– Detail ————— Rationale
    HDFS Replication Factor (RF) —————————- 2 ——————- Hadoop Job Reliability and Parallelization
    Acropolis Replication Factor (RF) ——————– 2 ——————– Data availability


    In the above diagram once the local data node writes A1, ADFS will be create B1 locally and will create the 2nd copy based on Availability domains from Nutanix. HDFS also writes A2 so the same process happens with C! and C2 being created synchronously. Since the Hadoop DataNode will have knowledge of A1 and A2 both copies can be used for task parallelization.
    In this environment you would potential have 1 extra copy of data versus traditional Hadoop. To address the extra storage consumption you can apply EC-X. As an example you may have 30 node Hadoop cluster formed with NX-6235 which would have ~900 TB of raw capacity. If you set the EC-X strip width to 18/1 you can figure out the following overhead.

    Useable Storage = ((20% * Total RAW capacity * / ADSF RF Overhead) + (80% * Total RAW capacity * EC-X Overhead)) / (HDFS RF2)
    Useable Storage = (0.2 * 9252 GB * 2) + ( 0.8 * 9252 * 1.06) / HDFS RF
    Useable Storage = 925.2 GB + 6982.6 GB / HDFS RF
    Useable Storage = 7907.8 GB / 2
    Useable Storage = 3953.9 GB
    Therefore 9252 GB / 3953.9 GB = 2.34 X Overhead which is less than traditional Hadoop.

    Nutanix provides the ideal combination of compute and high-performance local storage; providing the best possible architecture for Hadoop and other distributed applications and gives you more space to perform business analytics.


    Why virtualize Hadoop nodes on the Nutanix Xtreme Computing Platform?


    o Make Hadoop an App: Prism’s HTML 5 user interface makes managing infrastructure pain free with one-click upgrades. Integrated data protection can be used to manage golden images for Hadoop across multiple Nutanix clusters. Painfully firmware upgrades are easily addressed and time saved.
    o No Hypervisor Tax: The Acropolis Hypervisor is included with all Nutanix clusters. Acropolis High Availability and automated Security Technical Implementation Guides (STIG) keeps your data available and secure.
    o Hardware utilization: Bare-metal Hadoop deployments average 10-20% CPU utilization, a major waste of hardware resources and datacenter space. Virtualizing Hadoop allows for better hardware utilization and flexibility. Virtualization can also help in right size your solution. If you job complementation times are meeting windows no need buying more hardware. If more resources are needed, they can easily be adjusted.
    o Elastic MapReduce and scaling: Dynamic addition and removal of Hadoop nodes based on load allow you to scale based upon your current needs, not what you expect. Enable supply and demand to be in true synergy. Hadoop DataNodes can be quickly clones out in seconds.
    o DevOps: Big Data scientists demand performance, reliability, and a flexible scale model. IT operations relies on virtualization to tame server sprawl, increase utilization, encapsulate workloads, manage capacity growth, and alleviate disruptive outages caused by hardware downtime. By virtualizing Hadoop, Data Scientists and IT Ops mutually achieve all objectives while preserving autonomy and independence for their respective responsibilities
    o Sandboxing of jobs: Buggy MapReduce jobs can quickly saturate hardware resources, creating havoc for remaining jobs in the queue. Virtualizing Hadoop clusters encapsulates and sandboxes MapReduce jobs from other important sorting runs and general purpose workloads
    o Batch Scheduling & Stacked workloads: Allow all workloads and applications to co-exist, e.g. Hadoop, Virtual Desktops and Servers. Schedule job runs during off-peak hours to take advantage of idle night time and weekend hours that would otherwise go to waste. Nutanix also allows to bypass the flash tier for sequential workloads which can prevent the time it takes to rewarm cache for mixed workloads.
    o New Hadoop economics: Bare metal implementations are expensive and can spiral out of control. Downtime and underutilized CPU consequences of physical server’s workloads can jeopardize project viability. Virtualizing Hadoop reduces complexity and ensures success for sophisticated projects with a scale-out grow as you go model – a perfect fit for Big Data projects
    o Blazing fast performance: Up to 3,500 MB/s of sequential throughput in a compact 2U 4-node cluster. A TeraSort benchmark yields 529 MB/s in the same 2U cluster
    o Unified data platform: Run multiple data processing platforms along with Hadoop YARN on a single unified data platform, Acropolis Distributed File System (ADFS).
    o Flash SSDs for NoSQL: The summaries that roll up to a NoSQL database like HBase are used to run business reports and are typically memory and IOPS-heavy. Nutanix has SSD tiers coupled along with dense memory capacities. With its automatic tiering technology can transparently bring IOPS-heavy workloads to SSD tiers
    o Analytic High-density Engine: With the Nutanix solution you can start small and scale. A single Nutanix block can comes packed up to 40TB storage and 96 cores in a compact 2U footprint. Given the modularity of the solution, you can granularly scale per-node (up to ~10TB/24 cores), per-block (up to ~40TB/96 cores), or with multiple blocks giving you the ability to accurately match supply with demand and minimize the upfront CapEx.
    o Change management: Maintain environmental control and separation between development, test, staging, and production environments. Snapshots and fast clones can help in sharing production data with non-production jobs, without requiring full copies and unnecessary data duplication.
    o Business continuity and data protection: Nutanix can provide replication across sites to provide additional protection for the NameNode and DataNodes. Replication can be setup to avoid sending wasteful temporary data across the WAN using per VM replication and container based replication.
    o Data efficiency: The Nutanix solution is truly VM-centric for all compression policies. Unlike traditional solutions that perform compression mainly at the LUN level, the Nutanix solution provides all of these capabilities at the VM and file level, greatly increasing efficiency and simplicity. These capabilities ensure the highest possible compression/decompression performance on a sub-block level. While developers may or may not run jobs with compression, IT Operations can ensure cold data is effectively stored. Nutanix Erasure Coding and also be applied on top of compression saving.
    o Automatic Auto-Leveling and Auto-Archive: Nutanix will spread data evenly across the cluster ensuring local drives don’t fill up causing an outage when space is available. Using Nutanix cold storage nodes cold data can be moved from compute nodes, freeing up room for hot data while not consuming additional licenses.
    o Time-sliced clusters: Like public cloud EC2 environments, Nutanix can provide a truly converged cloud infrastructure allowing you to run your Hadoop, server and desktop virtualization on a single converged cloud. Get the efficiency and savings you require with a converged cloud on a truly converged architecture.


    Why is Nutanix delivering the Acropolis Hypervisor?

    CEO, Dheeraj Pandey talks about why Nutanix is delivering another hypervisor and talks about Prism.


    Nutanix Acropolis and XenDesktop – Operations Made Easy – NPX / VMturbo

    VDI is hopefully ran/maintained by your desktop team, not your highest paid guys on the operations team. One of the main reasons of why the Acropolis hypervisor was created was to control the experience as the desktops are managed and as the environment scales. Two of the most sought out after questions in designing a VDI solution is how many desktops per LUN/NFS/Container and how many desktops per management resource(vCenter/SCVMM). On the number of desktops per container for NFS on vCenter for Nutanix the VPXA agent would have problems around 2,000 desktops per container. This is still a far cry from typical 150 for block storage and 250 desktops for traditional NFS based storage but something you needed to account for. With the Acropolis hypervisor there is no need to keep track of the number of desktops running in a container so it makes that design consideration pretty easy.

    Nothing was worse of me on Saturday afternoon updating my VDI environment waiting for the desktops to update in a serial nature. This is the main reason why both vCenter and SCVMM are architectured to only support ~2,000 desktops for VDI, any more than you’ll be waiting around as you update your desktops. Since Prism (Nutanix Management) is based on scale out principles (Map Reduce/Apache Cassandra) we can parallelize the service requests across the whole cluster. Anything added benefit is that Prism is highly available so you don’t have to worry about a single point of failure in your environment.

    Avoid the bottleneck of virtualization management with Acropolis.

    Avoid the bottleneck of virtualization management with Acropolis.

    The Acropolis hypervisor doesn’t have memory over-commitment today but between Large Pages from Windows 7 on and highly dynamic environments were machines are consistently rebooting/re-created memory over-commitment is not as strong as it once was in the Windows XP days. The Acropolis hypervisor can place machines on boot were the majority of their data resides and based on available resources. To further the integration of data placement and compute resources check out what VMturbo is doing for QofS on Nutanix.

    In simplifying the day 2 support for VDI, user experience will go up with consistent performance and availability. Acropolis is freed from the burden of past virtualization management choices and continue to provide one-click everything to run day to day operations to ease. Looking forward to seeing the Citrix and Nutanix relationship to grow even more over the coming months.


    FSLogix For User Profiles and Application Support For Any Hypervisor

    A couple weeks ago I had a great conversation with Cassondra McAllister who is an SE at FSLogix. She has a lot of field time around user profiles as she was at RTO before they were bought by VMware. Cassondra has been working a FSLogix from the early stages and I was lucky to get a demo from here a couple of weeks ago.

    fslogixFSLogix uses image masking to create a single unified base image that hides everything a logged in user shouldn’t see, providing predictable and real-time access to applications and profiles for VDI/XenApp. Companies can combine all applications, plus browser and app plugins, onto a single gold image, or greatly reduce their current number of images. Combine this with quick clones across Acropolis, hyper-v & vSphere and the fact that FSLogix works in the user space can have a pretty simple solution for VDI and XenApp/RDS solutions on Nutanix. This is all possible because FSLogix works at login on time to add the appropriate VHD to the desktop while maintaining fast logons.

    User profiles with FSLogix have shown shorter logon and logoff times, with an improvement of +50% compared to roaming profiles and use less traffic and processing load on file server(s), since files are only accessed if needed. Combine this with Nutanix’s up and coming file services and you’ll have a strong and easy combination for VDI.

    The core of the FSLogix Apps software agent is the driver (frxdrv.sys). This component is a file system mini-filter driver. It is primarily responsible for intercepting requests from other software on the system, to access objects such as files or registry keys, and changing them. When an application is being hidden, the changes might involve simply making the object appear to not exist.

    The second major component in the FSLogix Apps software agent is the service (frxsvc.exe). This component is a Windows service running as system. It is responsible for communicating various data about
    the system state to the driver, e.g., informing the driver when new users login to the machine.

    The other components of the software agent are various user interfaces and supporting files. There is a GUI rule editor that assists administrators in creating rule sets, hiding applications installed on a system, making rule assignments to control how rules are applied, testing rules, etc. There is also a command-line interface that can be used for scripting many of the functions that the software can perform. A Windows Event Viewer integration module is also provided to assist administrators in the management and audit of the software.

    GUI Rule Editor

    People should really check FSlogiX out for cost effective solution that has low overhead from both required infrastructure and overhead on the desktop.


    Introduction to DevOps with Udacity

    Me and Karl

    Me and Karl

    This course was an awesome side project. I was lucky to help with an Intro to DevOps side by side with Karl Krueger. Karl really is the star of the show. Karl worked at Google as an SRE for over 8 years and has some great insightd from the development side. Most of my life has been on the infrastructure side of the house but I did have a brief journey in development in the early 2000’s. I think both sides of the fence have allowed me to have a lot of empathy to my developer friends.

    Like Karl, all the people at Udacity are top notch and made me question how they can pay for all the talent they have. Gundega Dekena did a lot of work half way across the world too. Gundega did a ton work with the course content drawing on personal experience as well. When you take a Udacity course you really get a team of people producing the content. I would encourage you to check out some of their free content as well and if you like it, grab a subscription. The video work used for demo’s is really amazing stuff from what I have seen on the market.

    Let me know if you take the course and how it went.



    Is Hyperconverged Infrastruce More Secure Than Traditional Infrastructure? #Podcast

    An interesting talk about what Nutanix is doing around Security with Eric Hammersley, a security architect with Nutanix. Eric talks about Security development life cycle, STIGS and using automation to tighten lock down the hatches of the XCP platform. Eric talk about the advantages of knowing your own platform and how that can enable engineering to make the most secure platform.

    I find it funny how the most hated guy in the data center (the security guy) is now getting a lot of respect. I am sure the endless supply of credit card information ending up on the world wide web is helping to fuel this. Give the podcast a listen and let us know what you think.



    Podcast Lollapalooza – EUC Podcast and Frontline Chatter

    Last week I was able to sneak onto two podcasts. End User Computing Podcast and the Frontline Chatter Chat. I have to report with deep sadness that the 2016 .Next Conference is NOT in Sydney, my love of rugby must have had the better of me. The 2016 .Next conference is in Las Vegas.

    EUC Podacst was mostly around Synergy and cloud management with a sprinkle of layering and the Frontline Chatter was great review on the announcements for .Next including what Acropolis is a whole.

    Listen to both today and let me know what you think.