Nov
14

Docker Datacenter 2.0 for Virtual Admins

Just a short video walking thru how easy it is to get an environment up and running with Docker Datacenter 2.0 on top of AHV.

High level points:

* If you can deploy an VM you can setup Docker Datacenter
* Management of new docker hosts is easliy done with pre-generated code to paste into new hosts
* Docker Datacenter has the ability to run both services and compose apps side by side in the same Docker Datacenter environment

Later this week I hope to have a post talking about the integration with Docker Datacenter and the Docker trusted registry.

    Oct
    31

    Eliminate Standalone NAS & What’s new with Horizon 7

    Thought I would post the links to 2 new on-demand webinars. The Horizon 7 webinar has some Nutanix but mostly focused on Instant Clones, App Volumes and user impact.

    Horizon 7: New Features and How it Impacts User Experience

    The AFS webinar has some great questions and there is a demo at the end as well.

    Eliminate Standalone NAS for your file server needs with Nutanix Acropolis File Services

    Sep
    16

    Build Large File Services Repositories on Nutanix’s Largest Capacity Nodes, the NX-6035C-G5

    Nutanix continues on its Enterprise Cloud journey at the .NEXT On-Tour event in Bangkok, Thailand. Today, we are proud to announce that we are planning to support Acropolis File Services (AFS) on our storage only nodes, the NX-6035C-G5. Acropolis File Services provides a simple and scalable solution for hosting user and shared department files across a centralized location with a single namespace. With Acropolis File Services, administrators no longer waste time with manual configuration or need Active Directory and load balancing expertise. If and when released, this will make 6035C-G5 nodes even more versatile, adding to the current capabilities of serving as a backup or replication target and running Acropolis Block Services.

    [read more here]

    Aug
    01

    The Tale Of Two Lines: Instant-Clones on Nutanix

    There was a part of me that wanted to hate on Instant Clones that are new in Horizon 7 but the fact is they’re worth the price of admission. Instant-clones has very low overhead to provide true on-demand desktops or as VMware is tagging it, Just-In-Time desktops.

    On-demand desktops with View Composer..... not happening

    On-demand desktops with View Composer….. not happening

    In my health care days the non-president desktops and shift change always resulted it some blunt force trauma around 7 am and 7 pm when staff would start their day. They only real way to counter balance the added load of login storms was to make sure the desktops were pre-built. This of course means you need so have some desktops sitting around doing nothing waiting for the these two time periods in the day, or use generic logins and then the user never disconnects which was another bag of problems.

    Instant-clones ability to clone a live running VM by simply quiescing the VM is really amazing. Have you ever changed the name of the a desktop and then windows tells you to reboot? If your like me your try to do 5 or 6 other things before you have to reboot which usually ends up in a mess. Instant-clones uses a feature called clone prep to add the VM to AD and change it’s name, all while not having to reboot the VM. When you see a power on operation inside of vCenter it’s actually just quiescing the desktop so there is very low overhead.

    The steps during Clone Prep. MS does not support Clone Prep but they didn't for View Composer so I don't see it being any different.

    The steps during Clone Prep. MS does not support Clone Prep but they didn’t for View Composer so I don’t see it being any different.

    When I went to test instant-clones I wanted to see if on-demand desktops was actually possible without destroying node densities. I had two test runs with Login VSI, 1 run with 400 knowledge users with all the desktops pre-deployed and 1 run with 400 knowledge users but I only started with 50 desktops. I had set the desktop pool to always have at least 30 free desktops until the pool got to 400 desktops.

    Instant-clones delivers on-demand desktops with very low overhead.

    Instant-clones delivers on-demand desktops with very low overhead.

    The darker blue line represents the on-demand test and you can see that the impact over 400 hundred users is pretty small. This is pretty remarkable from a CPU and memory consumption on boot that is being almost eliminated.

    It’s not all unicorns and rainbows however, instant clones does have some limitations in the first release:

    No dedicated Desktop Pools
    No RDS Desktop or Application Pools
    Limited SVGA Support – Fixed max resolution & number of monitors
    No 3D Rendering / GPU Support
    No Sysprep support – Single SID across pool
    No VVOL or VAAI NFS Hardware Clones support (Smaller desktops pools may take longer to provision)
    No Powershell
    No Multi-VLAN Support in a single Pool
    No Reusable Computer Accounts
    No Persistent Disks – Use Writable Volumes \ Flex App \ Unidesk \ RES …….

    vMotion Is supported

    Like anything use case will dictate when this gets used but its a powerful tool inside of Horizon. I plan to show some of the differences between View Composer and Instant Clones in my next posts. Also keep in mind that you still need high IO to service your desktops. Size for the peaks or face the wrath of your end users.

    Jul
    28

    The Impact On App Layering On Your VDI Environment

    I was testing instant clones in Horizon 7 and it was pretty much a requirement to use some form of application virtualization and get your user data stored off the desktops. My decision on what to select for for testing was based on that I had already had ProfileUnity from Liquidware Labs and App Volumes is bundled in View at the higher layers. I wanted to see the impact of layering on CPU and login times. I has also used UberAgent to collect some of the results. While testing I would run one test run with UberAgent to collect login times and then one with UberAgent agent turned off to collect CPU metrics.

    I used three separate applications, each in their own layer.

    * Gimp 2.8
    * iTunes 10
    * VLC

    I used AppVolumes 2.11 since 3.0 is kind of dead in the water and not recommend for existing customers so I can’t see a lot of people using it till the next release. ProUnity was version 6.5

    I first did a base run with no App Stacks or Flex Apps but with a roaming profile being stored on Acropolis File Services. The desktops were running horizon 7 agent and office 2013 and were instant clones. The desktops were Windows 10 with 2 vCPU and 2 GB of RAM. When you see the % listed is a factor of both CPUs.

    Base Run
    baserun

    So not to bad 14 secs login, probably some clean up I could do to make it faster but also not that realistic if your thinking about enterprise desktop so I was happy with this.

    I did test with 1 layer at a time until I used all of the 3 applications. There was a gradual increase in CPU and login time for each layer. The CPU cost comes from the agent and attaching the vmdk to the desktop.

    App Volumes with 3 AppStacks

    3appstacks

    So with 3 layers the CPU jumped by ~20% and the login time went up ~9 secs with App Volumes.

    3 Flex Apps

    3appstacks

    flexapp

    With 3 Flex Apps CPU jumped a bit and login times went up ~4 sec.


    Overall Review

    layeringreview

    What does this all mean?

    Well if you have users that only disconnect and reconnect and rarely log out then this means absolutely nothing for the most part. If you have a user base that gets fresh new desktops all of the time and things like large shift changes then it means your densities will go down. I like to say “Looking is for free, and touching is going to cost you”. Overall I still feel this is a small price to pay to have a successful VDI deployment and layering will help out the process.

    Jul
    09

    Making A Better Distributed System – Nutanix Degraded Node Detection

    55679934

    Distributed systems are hard, there no doubt about that. One of the major problems is what to do when a node is unhealthy and can be affecting performance of the overall cluster. Fail hard, fail fast is distributed system principle but how do you go about detecting an issue before even a failure occurs? AOS 4.5.3, 4.6.2 and 4.7 will includes the Nutanix implementation of degraded node detection and isolation. A bad performing hardware component or network issue can be a death of thousands cuts versus a failure which is pretty cut and dry. If a remote CVM is not performing well it can affect the acknowledgement of writes coming from other hosts and other factors may affect performance like:

    * Significant network bandwidth reduction
    * Network packet drops
    * CPU Soft lockups
    * Partially bad disks
    * Hardware issues

    The list of issues can even be unknown so Nutanix Engineering has come with a score systems that uses votes to make sure everything can be compared.
    Services running on each node of the cluster will publish scores/votes for services running on other nodes. Peer health scores will be computed based on various metrics like RPC latency, RPC failures/timeouts, Network latency etc. If services running on one node are consistently receiving bad scores for large period (~10 mins), then other peers will convict that node as degraded node.

    Walk, Crawl, Run – Degraded Node Expectations:

    A node will not be marked as degraded if current cluster Fault Tolerance (FT) level is less than desired value. Upgrades and break fix actions will not be allowed while a node is in the degraded state. A node will only be marked as degraded if we get bad peer health scores for 10 minutes. In AOS 4.5.3, the first shipping AOS release to include this feature, the default settings are that degraded node logging will be enabled but degraded node action will be disabled. In AOS 4.7 and AOS 4.6.2 additional user controls will be provided to select an “action policy” for when a degraded node is detected. Options should include No Action, Reboot CVM or Shutdown Node). While the peers scoring is always on, the action is side is disabled for the first release as ultra conservative approach.

    In AOS 4.5.3 if the degraded node action setting is enabled leadership of critical services will not be hosted on the degraded node. A degraded node will be put into maintenance mode and CVM will be rebooted. Services will not start on this CVM upon reboot. An Alert will be generated for degraded node.

    In AOS 4.7 and AOS 4.6.2 additional user controls will be provided to select an “action policy” for when a degraded node is detected. Options should include No Action, Reboot CVM or Shutdown Node

    To enable the degraded node action setting use the NCLI command:

    nutanix@cvm:~$ ncli cluster edit-params disable-degraded-node-monitoring=false

    The feature will further increase the availability and resilience for Nutanix customers. While top performance numbers grab the headlines, remember the first step is to have a running cluster.

    AI for the control plane………… Maybe we’ll get out voted for our jobs!

    Jun
    29

    Nutanix Security Configuration Management Automation at Work #DOD #PCI

    A short video of someone changing the security settings for a Apache Tomcat directory and files. It really could be anything, dropping a firewall, opening a port and the list goes on. The video shows how often the settings are being checked and then we manually run the automation framework to check over 600 DOD/PCI level requirements in minutes.

    Jun
    27

    Nutanix Search to Find, Build, Create and Improve

    To streamline access to features, Nutanix lets you quickly search for data points and reduces the clicks required to find information through the search function. Prism Pro delivers a web-like search engine experience for your Nutanix environment. Administrators can simply enter common tasks and entities into the search bar to perform searches. The interface displays the returned results in four vertical columns, each representing a different type of result relating to the search query.
    The four columns present a list of entities, top analytics about the entities, appropriate actions, related alerts, and help topics that relate to the entities. The help topics provide links to online Nutanix documentation that can help explain features and clarify how to configure them or perform corrective actions.

    search

    The search function offers autocomplete to help administrators identify or complete the string that they want to search for.

    auto

    Nutanix embodies a radically new approach to enterprise infrastructure—one that simplifies every step of the infrastructure life cycle, from buying and deploying to managing, scaling, and supporting.

    Read more about managing your infrastructure with Prism Pro from Brian Suhr

    May
    12

    Impact of Nutanix VSS Hardware Support

    When 4.6 was released I wrote about how the newly added VSS support with Nutanix Guest Tools (NGT) was the gem of the release. It was fairly big compliment considering some of the important updates that were in the release like cross hypervisor DR and another giant leap in performance.

    I finally set some time aside to test the impact of taking a application consistent snapshot with VMware Tools vs the Nutanix VSS Hardware Support.

    vmware-vss-qWhen an application consistent snapshot workflow without NGT on ESXi, we take an ESXi snapshot so VMware tools can be used to quiesce the file system. Every time we take an ESXi snapshot, it results in creation of delta disks, During this process ESXi “stuns” the VM to remap virtual disks to these delta files. The amount of stun depends on the number of virtual disks that are attached to the VM and speed in which the delta disks can be created (capability of the underlying storage to process NFS meta-data update operations + releasing/creating/acquiring lock files for all the virtual disks). In this time, the VM is totally unresponsive. No application will run inside the VM, and pings to the VM will fail.

    We then delete the snapshot (after backing up the files via hardware snap on the Nutanix side) which results in another set of stuns (deleting a snapshot causes two stuns, one fixed time stun + another stun based on the number of virtual disks). This essentially means that we are causing two or three stuns in rapid succession. These stuns cause meta-data updates in addition to the flushing of data during the VSS snapshot operations.

    Customers have reported in set of VMs running Microsoft clustering, these VMs can be voted out due to heartbeat failure. VMware gives customer guidance on increasing timers if your using Microsoft clustering to get around this situation.

    To test this out I used HammerDB with a SQL 2014 running on Windows 2012R2. The tests were run on ESXi 6.0 with hardware version 11.

    sqlvm

    VMware Tools with VSS based Snapshot
    I was going to try to stitch the images together because of the time it took but decided to leave as is.
    VMware-VSS-1vmwaretools

    VMware-VSS-2vmwaretools

    The total process took ~4 minutes.

    NGT with VSS Hardware Support based Snapshot
    NGT based VSS snapshots don’t cause VM stuns. The application will be stunned temporarily within Windows to flush the data, but pings and other things should work.

    NGT-VSS-Snapshot

    The total process took ~1 minute.

    Conclusion

    NGT with VSS hardware support is the Belle of the Ball! While there is no fixed number to explain the max stun times. It depends on how heavy the workload is but what we can see is the effect of not using NGT for application consistent snapshot and it’s pretty big. The collapsing of ESXi snapshots cause additional load and should be avoided if possible. NGT offers hypervisor agnostic approach and currently works with AHV as well.

    Note: Hypervisor snapshot consolidation is better in ESXi 6 than ESXi 5.5.

    Thanks to Karthik Chandrasekaran and Manan Shah for all their hard work and contribution to this blog post.

    Apr
    27

    SAP Best Practices and Sizing on Nutanix

    SAP-NETWEAVERAt the heart of SAP Business Suite is the SAP ERP application, which is supplemented by SAP
    CRM, SAP SRM, SAP PLM, and SAP SCM. From financial accounting through manufacturing, logistics, sales, marketing, and human resources, SAP Business Suite manages all the key mission-critical business processes that occur each day in companies around the world. SAP NetWeaver is the technical foundation for many SAP applications; it is a solution stack of SAP’s technology products.

    Deploying and operating SAP Business Suite applications in your environment is not a trivial task. Nutanix enterprise cloud platforms provide the reliability, predictability, and performance that the SAP Business Suite demands, all with an efficient and elegant management interface.

    The Nutanix platform offers SAP customers a range of benefits, including:

    • Lower risk and cost on the first hyperconverged platform SAP-certified for NetWeaver applications.
    • A turnkey validated framework that dramatically reduces the time to deploy your SAP
    applications.
    • Mission-critical availability with a self-healing foundation and VM-centric data protection, including support for the top enterprise backup solutions.
    • Flexibility to choose among industry-leading SAP-supported hypervisors.
    • Simplified operations, including application- and VM-level metrics alongside single-click
    provisioning and upgrades.
    • Reduced TCO from infrastructure right-sized for your SAP workload.
    • A best-in-class worldwide support system whose knowledge and commitment to customer service has earned the Omega NorthFace Scoreboard Award for three consecutive years.

    Read the Solution Note for best practices with both Hyper-V and VMware and sizing guidelines => SAP Solution Note