Jul
    19

    Securing the Supply Chain with Nutanix and Docker #dockercon2016

    I was watching the below video from DockerCon 2016 and there was lots of striking similarities between what Nutanix and Docker is doing secure working environment for the Enterprise Cloud. There is no sense turning the alarm on for your house and then not locking the doors. You need to close all the gaps for your infrastructure and the applications that live on top of it.

    The most interesting part of the session for me was the section on security scanning and gating. Docker has Security Scanning which is available as an add-on to Docker hosted private repositories on both Docker Cloud and Docker Hub. Scans run each time a build pushes a new image to your private repository. They also run when you add a new image or tag. Most scans complete within an hour, however large repositories may take up to 24 hours to scan. The scan traverses each layer of the image, identifies the software components in each layer, and indexes the SHA of each component.
    docker-scanniing
    The scan compares the SHA of each component against the Common Vulnerabilities and Exposures (CVE) database. The CVE is a “dictionary” of known information security vulnerabilities. When the CVE database is updated, the service reviews the indexed components for any that match the new vulnerability. If the new vulnerability is detected in an image, the service sends an email alert to the maintainers of the image.

    A single component can contain multiple vulnerabilities or exposures and Docker Security Scanning reports on each one. You can click an individual vulnerability report from the scan results and navigate to the specific CVE report data to learn more about it.

    On the Nutanix side of the fence all code is scanned with 2 different vulnerability scanners at every step of the development life-cycle. To top that off Nutanix already apply s an intrinsic baseline, and we already monitor and self-heal that baseline with SCMA the Security Configuration Management Automation and leverage the SaltStack framework so that your production systems can Self-Heal from any deviation and are always in compliance. Features like two factor authentication (2FA) and cluster lockdown further enhance the security posture. The cluster-wide setting can forward all logs to a central host as well. All CVEs related to the product are tracked and provide an internal turn around time of 72 hours for critical patches! There is some added time on getting a release cut but it fast and everything is tested as whole instead of a one off change that could have a domino a effect.

    When evaluating infrastructure and development environments for a security-conscious environment, it’s imperative to choose one that is built with a security-first approach that continually iterate on patching new threats thereby reducing the attack surface. Docker is doing some great work on this front.


      Jul
      14

      Nutanix Acropolis File Services – Required 2 Networks

      When configuring Acropolis File Services you may be prompted with the following message:

      “File server creation requires two unique networks to be configured beforehand.”

      The reason is you two managed networks for AFS. I’ve seen this come up a lot lately so I thought I would explain the why. While it may change over time this is the current design.

      fs-tor

      The above diagram shows one file server VM running on a node, but you can put multiple file server VMs on a node for multitenancy.

      The file server VM has two network interfaces. The first interface is a static address used for the local file server VM service that talks to the Minerva CVM service running on the Controller VM. The Minerva CVM service uses this information to manage deployment and failover; it also allows control over one-click upgrades and maintenance. Having local awareness from the CVM enables the file server VM to determine if a storage fault has occurred and, if so, if action should be taken to rectify it. The local address also lets the file server VM claim vDisks for failover and failback. The file server VM service sends a heartbeat to its local Minerva CVM service each second, indicating its state and that it’s alive.
      The second network interface on the file server VM, also referred to as the public interface, allows clients to service SMB requests. Based on the resource called, the file server VM determines whether to service the request locally or to use DFS to refer the request to the appropriate file server VM that owns the resource. This second network can be dynamically reassigned to other file server VM’s for high availability.

      If you need help setting up the two managmed networks there is KB article on portal.nutanix.com -> KB3406

      Jul
      13

      Backing up AFS with Commvault

      This is by no means a best practice guide for AFS and Commvault but I wanted to make sure that Commvault could be used to backup Acropolis File Services (AFS). If want more details on AFS I suggest reading this great post on the Nutanix website.

      Once I applied the file server license to CommServe I was off to the races. I had 400 users spread out on 3 file server VMs making up the file server called eucfs.tenanta.com. The file server had two shares but I was focused on backing up the user share.

      commvault-afs-users

      400users

      I found performance could be increased by adding more readers for the backup job. My media agent was last configured with 8 vCPU and it seemed to be the bottleneck. If I were to give the media agent more CPU I am sure I would have had a even faster backup time.

      commvault-readers-nutaix-afspng

      I was able to get almost 600 GB/Hour which I am told is a good number for file backup. There looks like there is lots of room to improve though. The end goal will be to try and backup a million files and see what happens over the course of time.

      600

      Like all good backup stories, it’s all about the restores and it appears to drill down real nice.

      commvault-users-restore-afs4

      Jul
      12

      Just In Time Desktops (Instant Clones) on Nutanix

      JIT desktops are supported on Nutanix. One current limitation of JIT is that it doesn’t support VAAI for NFS Hardware Clones. The great part for Nutanix customers is that were VAAI clones stop, shadow clones kicks into affect! So if you want to keep a lower amount of RAM for configured for the View Storage Accelerator your perfectly OK in doing that.

      The Nutanix Distributed Filesystem has a feature called ‘Shadow Clones’ which allows for distributed caching of particular vDisks or VM data which is in a ‘multi-reader’ scenario. A great example of this is during a VDI deployment many ‘linked clones’ will be forwarding read requests to a central master or ‘Base VM’. In the case of VMware View this is called the replica disk and is read by all linked clones. This will also work in any scenario which may be a multi-reader scenario (eg. deployment servers, repositories, App Volumes, etc.)

      You can read more about Shadow CLones in this Tech Note -> HERE

      An Introduction to Instant Clones -> HERE

        Jul
        11

        SRM and Commvault Health Check

        The NCC health check pds_share_vms_check verifies that the protection domains do not share any VMs. It would be good practice to run this healh check after configuring either SRM or using Intellisnap from Commvault. It’s one of over 200 hundred checks NCC provides.

        This check is available from the NCC 2.2.5 release and is part of the full health check that you can run by using the following command:

        nutanix@cvm$ ncc health_checks run_all

        You can also run this check separately by using the following command:

        nutanix@cvm$ ncc health_checks data_protection_checks protection_domain_checks pds_share_vms_check

        A protection domain is a group of VMs that you can replicate together on a desired schedule.

        A VM can be part of two protection domains if the following conditions are met:

        A protection domain (Async DR or Metro Availaibility) is created, and the VM is added as a protected entity of this protection domain. The vstore containing the VM is protected by using ncli or by an external third-party product such as Commvault or SRM. Protecting a vstore automatically creates a protection domain. These protection mechanisms are mutually exclusive, which means that the backups of the VM might fail if the VM is in 2 protection domains.

        Solution

        If the check returns a FAIL status, the reported VMs need to be removed from some of the listed protection domains, so that they remain only inside one protection domain.
        If your using metro availability you may have move the VM to another container or stop protecting the vstore.

        Jul
        09

        Making A Better Distributed System – Nutanix Degraded Node Detection

        55679934

        Distributed systems are hard, there no doubt about that. One of the major problems is what to do when a node is unhealthy and can be affecting performance of the overall cluster. Fail hard, fail fast is distributed system principle but how do you go about detecting an issue before even a failure occurs? AOS 4.5.3, 4.6.2 and 4.7 will includes the Nutanix implementation of degraded node detection and isolation. A bad performing hardware component or network issue can be a death of thousands cuts versus a failure which is pretty cut and dry. If a remote CVM is not performing well it can affect the acknowledgement of writes coming from other hosts and other factors may affect performance like:

        * Significant network bandwidth reduction
        * Network packet drops
        * CPU Soft lockups
        * Partially bad disks
        * Hardware issues

        The list of issues can even be unknown so Nutanix Engineering has come with a score systems that uses votes to make sure everything can be compared.
        Services running on each node of the cluster will publish scores/votes for services running on other nodes. Peer health scores will be computed based on various metrics like RPC latency, RPC failures/timeouts, Network latency etc. If services running on one node are consistently receiving bad scores for large period (~10 mins), then other peers will convict that node as degraded node.

        Walk, Crawl, Run – Degraded Node Expectations:

        A node will not be marked as degraded if current cluster Fault Tolerance (FT) level is less than desired value. Upgrades and break fix actions will not be allowed while a node is in the degraded state. A node will only be marked as degraded if we get bad peer health scores for 10 minutes. In AOS 4.5.3, the first shipping AOS release to include this feature, the default settings are that degraded node logging will be enabled but degraded node action will be disabled. In AOS 4.7 and AOS 4.6.2 additional user controls will be provided to select an “action policy” for when a degraded node is detected. Options should include No Action, Reboot CVM or Shutdown Node). While the peers scoring is always on, the action is side is disabled for the first release as ultra conservative approach.

        In AOS 4.5.3 if the degraded node action setting is enabled leadership of critical services will not be hosted on the degraded node. A degraded node will be put into maintenance mode and CVM will be rebooted. Services will not start on this CVM upon reboot. An Alert will be generated for degraded node.

        In AOS 4.7 and AOS 4.6.2 additional user controls will be provided to select an “action policy” for when a degraded node is detected. Options should include No Action, Reboot CVM or Shutdown Node

        To enable the degraded node action setting use the NCLI command:

        nutanix@cvm:~$ ncli cluster edit-params disable-degraded-node-monitoring=false

        The feature will further increase the availability and resilience for Nutanix customers. While top performance numbers grab the headlines, remember the first step is to have a running cluster.

        AI for the control plane………… Maybe we’ll get out voted for our jobs!

        Jul
        07

        Updated Best Practices – Nutanix DR and Backup & vSphere + Commvault

        Two best practices have been updated on this week. The Nutanix DR and Backup Best Practices is located in the Support Portal.

        <DR and Backup Best Practices>

        The update was around bandwidth sizing and added a link to Wolframalpha which spits out the sizing formula for you.

        The vSphere and Commvault Best Practice Guide added some guidance around IntelliSnap and sizing. At this time IntelliSnap and Metro is not supported but streaming is a fully supported option.

        <link>

        Jul
        06

        Chad Sakac talks about EMC selling Nutanix with Dell Technologies

        What will happen with Dell XC when EMC and Dell come together? Chad Sakac talks about it at the 18:40 mark from the ThinkAhead IT conference.

        From NextConf 2016
        Nutanix and Dell OEM relationship: Dell’s Alan Atkinson spoke to attendees about extending the OEM relationship and continuing to help our joint customers (including Williams) on their journeys to Enterprise Cloud in confidence.

        Jun
        29

        Nutanix Security Configuration Management Automation at Work #DOD #PCI

        A short video of someone changing the security settings for a Apache Tomcat directory and files. It really could be anything, dropping a firewall, opening a port and the list goes on. The video shows how often the settings are being checked and then we manually run the automation framework to check over 600 DOD/PCI level requirements in minutes.

        Jun
        27

        Nutanix Search to Find, Build, Create and Improve

        To streamline access to features, Nutanix lets you quickly search for data points and reduces the clicks required to find information through the search function. Prism Pro delivers a web-like search engine experience for your Nutanix environment. Administrators can simply enter common tasks and entities into the search bar to perform searches. The interface displays the returned results in four vertical columns, each representing a different type of result relating to the search query.
        The four columns present a list of entities, top analytics about the entities, appropriate actions, related alerts, and help topics that relate to the entities. The help topics provide links to online Nutanix documentation that can help explain features and clarify how to configure them or perform corrective actions.

        search

        The search function offers autocomplete to help administrators identify or complete the string that they want to search for.

        auto

        Nutanix embodies a radically new approach to enterprise infrastructure—one that simplifies every step of the infrastructure life cycle, from buying and deploying to managing, scaling, and supporting.

        Read more about managing your infrastructure with Prism Pro from Brian Suhr