Jun
    29

    Nutanix Security Configuration Management Automation at Work #DOD #PCI

    A short video of someone changing the security settings for a Apache Tomcat directory and files. It really could be anything, dropping a firewall, opening a port and the list goes on. The video shows how often the settings are being checked and then we manually run the automation framework to check over 600 DOD/PCI level requirements in minutes.

    Jun
    27

    Nutanix Search to Find, Build, Create and Improve

    To streamline access to features, Nutanix lets you quickly search for data points and reduces the clicks required to find information through the search function. Prism Pro delivers a web-like search engine experience for your Nutanix environment. Administrators can simply enter common tasks and entities into the search bar to perform searches. The interface displays the returned results in four vertical columns, each representing a different type of result relating to the search query.
    The four columns present a list of entities, top analytics about the entities, appropriate actions, related alerts, and help topics that relate to the entities. The help topics provide links to online Nutanix documentation that can help explain features and clarify how to configure them or perform corrective actions.

    search

    The search function offers autocomplete to help administrators identify or complete the string that they want to search for.

    auto

    Nutanix embodies a radically new approach to enterprise infrastructure—one that simplifies every step of the infrastructure life cycle, from buying and deploying to managing, scaling, and supporting.

    Read more about managing your infrastructure with Prism Pro from Brian Suhr

    Jun
    06

    Enterprise Cloud for SMB: Invisible Operations

    SMB (Small Medium Business) is one of the worst terms that is in the information technology industry. The acronym implies that there is a big difference between what a small company would need versus a large company in terms of requirements. I personally think that is couldn’t be further from the truth. Whether you are company of 500 people or 50,000 people you still want the highest levels of service and uptime for your customers. Both small and large companies want operational efficiency, fractional consumption and reduce security risk in the sector that they operate in. You can also make the point the smaller business has to more efficient than bigger companies because they don’t have same economies of scale their bigger enterprise versions.

    While I don’t think the vast majority of SMB’s are saying “I need to get a cloud strategy”, they are saying things like, “how can finish all of this work on my plate before the weekend?”, “How do I get to the use feature X so I can put off buying more capacity?”, “How can I spend more time with the customers instead waiting for this upgrade to finish?”.
    Nutanix is giving the benefits of Public cloud to business but on your terms to answers a lot of the above questions. The time spent on reading complex HCL’s and performing an nuanced cha cha in upgrading separate pieces of infrastructure just so you can stay on support doesn’t provide any real value to the business. Along with a mounting headache a lot of these upgrade activities bring a lot of risk when these tasks are not automated using a health check before you proceed. People want to talk about hardware failure rates but I most often it’s us as humans that bring the most risk in regards to downtime.

    A Gartner study projected that through 2015, “80 percent of outages impacting mission-critical services will be caused by people and process issues, and more than 50 percent of those outages will be caused by change, configuration, release integration and hand-off issues.” With these high numbers it’s easy see why people look to cloud to reduce the risk. Nutanix’s commitment to one-click everything including upgrades for the hypervisor, Acropolis software, BIOS/BMC and the hard drives can save countless hours and contributes to uptime.

    A lot of time is also spent on the management layers which really only exists to run applications. The virtualization management layer has become a sore sport in terms of training and maintenance to keep the lights on. Nutanix ‘s control plane, Prism run on every node and eliminates the need for an outside management layer. When you combine Prism with AHV you don’t have to worry about installing, managing and supporting a product to provide services like analytics, call home and live migration. No extra SQL or Oracle licensing in supporting your management layer.

    Nutanix’s ability to achieve one-click everything is possibly because it was designed for web-scale. Scale allows self-healing, ability to have clusters in mixed versions/states to handle complicated upgrade scenarios, add capacity independently of the hypervisor, and allow seamless patching for security updates. A key enabler is the ability to handle metadata in a dynamic fashion, from 3 nodes to 1000 nodes, all Nutanix customers reap the benefits. Let’s take a look at the upgrade use cases that highlight why Nutanix is the best choice of all business sizes including SMB.

    Upgrade

    So a new version of Acropolis was released and you want to get the updated SCMA (Security Configuration Management Automation) and the new performance improvements. What will be the impact to the running VM’s?

    Below is a Nutanix Cluster in a healthy state. All VM’s are writing 1 copy locally and 1 remote copy. As more writes come into the system the remote copies are evenly distributed across the cluster because of the intelligent metadata afforded by web-scale technologies.

    smb-ok-1

    Controller VM on Node 1 goes down for an upgrade

    smb-upgrade-1

    The SQL VM will already have knowledge of the controller VM’s and it’s business as usually. No VM’s or data needs to be moved for an upgrade so the process is fast and efficient. The system is designed to transparently handle these gracefully. In the event of an upgrade/failure, I/Os will be re-directed to other controller VMs within the cluster. Nutanix ALWAYS writes the minimum 2 copies of data which is not the same for other Hyper-Converged Infrastructure (HCI) vendors. By enforcing availability the new working set can also invalidate old data on the node that is being upgraded. This protects you from data loss if a drive goes down in the system during the upgrade and allows for quicker rebuild if the something bad happens with the node that is being upgrade.

    This clearly better than a 3 Tier Architecture were typical if you lose 1 controller your down to 50% of your storage performance. Other HCI vendors will give you the option to move all of the data off the system first or use RAID to overcome the oversight. If you take the use case of moving all data off the node how long will that take?

    Other HCI vendors requiring to move data off the node

    smb-upgrade-hci-1

    By moving data off the node the flash tier can quickly fill up and queueing will occur. Performance will impacted and not to mention the time to copy off TB’s of data. Some vendors that use RAID have no current way of rebuilding the data on the upgrading/failed node. The point to make here is that having elastic metadata like Nutanix allows you to self-heal and have low impact when carrying out maintenance operations.

    So what happens if the node never comes back up for Nutanix?

    smb-upgrade-faili-1

    Nutanix has the ability to rebuild data at the same tier at which failed at. This allow the cold storage tier not to impact the performance tier which would then have to be down migrated with ILM. HDD rebuilds to other HDDs and SSDs will rebuild to SSDs. This control and limiting of rebuilds helps to prevent network congestion especially on clusters that are only running 1 GB.

    While getting down into the weeds can highlight why Nutanix is different from other traditional and other HCI vendors the point is Nutanix allows you elevate your IT staff. You can forget the non-value tasks and start to focus on bringing delight to your customers. The person that might have 5 different hats, like security, networking, storage, virtualization and backup can now find a glimmer of hope in their daily work lives and start to think about work-life balance. Nutanix as the platform for Enterprise Cloud is perfect of any size, especially the small and medium sized business.

    May
    15

    Commvault Best Practices on Nutanix

    I first remember seeing Commvault in 2007 in the pages of Network World and thought it looked pretty interesting then. At the time I was an CA ARCserve junky and prayed everyday I didn’t have to restore anything. Almost 10 years latter tape is still around, virtualization spawned countless backup vendors and Cloud now makes a easy backup target. Today Commvault is still relevant and plays in all of the aforementioned spaces and like most tech companies we have our own overlap with them to some degree. For me Commvault just has so many options it’s almost a problem of what to use where and when.

    The newly released Best Practice Guide with Commvault talks about some of the many options that should be used with Nutanix. Probably the big things that would stand out in my mind if I was new to Nutanix and then read the guide would be the use of a proxy on every host and some of the caveats around Intellisnap.

    Proxy On Every Host
    bricks-vs-feathers

    What weights more? A pound of feathers or a pound of bricks? The point here is you need a proxy regardless and the proxy is sized on how much data you will be backing up. So instead of having 1 giant proxy you now have smaller proxies that are distributed across the cluster. Smaller proxies can read from local Hot SSD tier and limit network traffic so they can help to limit bottlenecks in your infrastructure.

    IntelliSnap is probaly one of the most talked about Commvault features. IntelliSnap allows you to create a point-in-time application-consistent snapshot of backup data on the DSF. The backup administrator doesn’t need to log on to Prism to provide this functionality. A Nutanix-based snapshot is created on the storage array as soon as the VMware snapshot is completed; the system then immediately removes the VMware snapshot. This approach minimizes the size of the redo log and shortens the reconciliation process to reduce the impact on the virtual machine being backed up and minimize the storage requirement for the temporary file. It also allows near-instantaneous snapshot mounts for data access.

    With IntelliSnap it’s important to realize that it was invented at a time where LUNS ruled the storage workload. IntelliSnap in some sense turns Nutanix’s giant volumes/containers the hypervisors sees into a giant LUN. Behind the scenes when Intellisnap is used it snaps the whole container regardless if the VMs are being backed up or not. So you should do a little planning when using IntelliSnap. This is ok since IntelliSnap should be used for high transnational VMs and not every VM in the data center. I just like to point out that streaming backups with CBT is still a great choice.

    With that being said you can checkout the full guide at the Nutanix Website: Commvault Best Practices

    May
    12

    Impact of Nutanix VSS Hardware Support

    When 4.6 was released I wrote about how the newly added VSS support with Nutanix Guest Tools (NGT) was the gem of the release. It was fairly big compliment considering some of the important updates that were in the release like cross hypervisor DR and another giant leap in performance.

    I finally set some time aside to test the impact of taking a application consistent snapshot with VMware Tools vs the Nutanix VSS Hardware Support.

    vmware-vss-qWhen an application consistent snapshot workflow without NGT on ESXi, we take an ESXi snapshot so VMware tools can be used to quiesce the file system. Every time we take an ESXi snapshot, it results in creation of delta disks, During this process ESXi “stuns” the VM to remap virtual disks to these delta files. The amount of stun depends on the number of virtual disks that are attached to the VM and speed in which the delta disks can be created (capability of the underlying storage to process NFS meta-data update operations + releasing/creating/acquiring lock files for all the virtual disks). In this time, the VM is totally unresponsive. No application will run inside the VM, and pings to the VM will fail.

    We then delete the snapshot (after backing up the files via hardware snap on the Nutanix side) which results in another set of stuns (deleting a snapshot causes two stuns, one fixed time stun + another stun based on the number of virtual disks). This essentially means that we are causing two or three stuns in rapid succession. These stuns cause meta-data updates in addition to the flushing of data during the VSS snapshot operations.

    Customers have reported in set of VMs running Microsoft clustering, these VMs can be voted out due to heartbeat failure. VMware gives customer guidance on increasing timers if your using Microsoft clustering to get around this situation.

    To test this out I used HammerDB with a SQL 2014 running on Windows 2012R2. The tests were run on ESXi 6.0 with hardware version 11.

    sqlvm

    VMware Tools with VSS based Snapshot
    I was going to try to stitch the images together because of the time it took but decided to leave as is.
    VMware-VSS-1vmwaretools

    VMware-VSS-2vmwaretools

    The total process took ~4 minutes.

    NGT with VSS Hardware Support based Snapshot
    NGT based VSS snapshots don’t cause VM stuns. The application will be stunned temporarily within Windows to flush the data, but pings and other things should work.

    NGT-VSS-Snapshot

    The total process took ~1 minute.

    Conclusion

    NGT with VSS hardware support is the Belle of the Ball! While there is no fixed number to explain the max stun times. It depends on how heavy the workload is but what we can see is the effect of not using NGT for application consistent snapshot and it’s pretty big. The collapsing of ESXi snapshots cause additional load and should be avoided if possible. NGT offers hypervisor agnostic approach and currently works with AHV as well.

    Note: Hypervisor snapshot consolidation is better in ESXi 6 than ESXi 5.5.

    Thanks to Karthik Chandrasekaran and Manan Shah for all their hard work and contribution to this blog post.

    Apr
    27

    SAP Best Practices and Sizing on Nutanix

    SAP-NETWEAVERAt the heart of SAP Business Suite is the SAP ERP application, which is supplemented by SAP
    CRM, SAP SRM, SAP PLM, and SAP SCM. From financial accounting through manufacturing, logistics, sales, marketing, and human resources, SAP Business Suite manages all the key mission-critical business processes that occur each day in companies around the world. SAP NetWeaver is the technical foundation for many SAP applications; it is a solution stack of SAP’s technology products.

    Deploying and operating SAP Business Suite applications in your environment is not a trivial task. Nutanix enterprise cloud platforms provide the reliability, predictability, and performance that the SAP Business Suite demands, all with an efficient and elegant management interface.

    The Nutanix platform offers SAP customers a range of benefits, including:

    • Lower risk and cost on the first hyperconverged platform SAP-certified for NetWeaver applications.
    • A turnkey validated framework that dramatically reduces the time to deploy your SAP
    applications.
    • Mission-critical availability with a self-healing foundation and VM-centric data protection, including support for the top enterprise backup solutions.
    • Flexibility to choose among industry-leading SAP-supported hypervisors.
    • Simplified operations, including application- and VM-level metrics alongside single-click
    provisioning and upgrades.
    • Reduced TCO from infrastructure right-sized for your SAP workload.
    • A best-in-class worldwide support system whose knowledge and commitment to customer service has earned the Omega NorthFace Scoreboard Award for three consecutive years.

    Read the Solution Note for best practices with both Hyper-V and VMware and sizing guidelines => SAP Solution Note

    Apr
    24

    Save Your Time With Nutanix Automatic Support

    Best Industry Support

    The feature known as Pulse is enabled by default and sends cluster status information automatically to Nutanix customer support. After you have completed initial setup, created a cluster, and opened ports 80 or 8443 in your firewall, AOS sends a Pulse message from each cluster once every 24 hours. Each message includes cluster configuration and health status that can be used by Nutanix Support to address any cluster operation issues.

    AOS can also send automatic alert email notifications to Nutanix Support by default through ports 80 or 8443. Like Pulse, any configured firewall must have these ports open. Some examples of conditions that will automatically generate a proactive case with Nutanix support with a Priority Level P4.

    The Stargate process is down for more than 3 hours
    Curator scan fails
    Hardware Clock Failure
    Faulty RAM module
    Power Supply failure
    Unable to fetch IPMI SDR repository (IPMI Error)
    HyperV networking
    System operations
    Disk Capacity > 90%
    Bad Drive

    You can optionally use your own SMTP server to send Pulse and alert notifications. If you do not or cannot configure an SMTP server, another option is to implement an HTTP proxy as part of your overall support scheme.

    While the best thing is never to a get a call, 2nd best is not waiting in line to open a ticket. Have a great week!

    Apr
    23

    3rd Generation Erasure Coding (EC-X) – What’s Next?

    Take time for all things: great haste makes great waste. Benjamin Franklin

    I don’t profess to be an erasure coding genius but I know enough that it would be very poor choice for workloads that has lots of overwrites, cycling thru lots of snapshots and running erasure coding inline would really only be suited for a WORM application which is not typical for a lot of virtual environments. Nutanix first released erasure coding as EC-X in AOS 4.1.3 as a tech preview and has learned lots along the way with it’s agile software development method.

    With AOS 4.6.1 being released on April 18th more improvements were added for EC-X.

      Faster reclamation – simply put if your EC strip is changed holes start appearing in your strip. You need an efficient of plugging the holes and allow them to be encoded again. /ol>

        Advanced EC-X selection heuristics – Nutanix engineering has come up with an algorithm to determine to use blocks form the same virtual hard drive or blocks from through out the container. Better selection reduces the need to fix strips and reduce CPU load on the cluster. This also helps to fix the problem of cycling through lots of snapshots.
        Strip compaction – If a EC-X strip has too many holes it won’t even try to fill the gaps. It will determine to move the data out of the strip

      With the mission to enable enterprise cloud more and more of the features are becoming self adjusting to truly allow for set and forget. The end goal is to have all the features turned on and let the system side. I am looking forward to watching the announcements at .Next in June.

      come to  .Next

    Apr
    23

    Operations Getting Down With DJ RunC & ContainerD

    Rundmc_2

    runC and containerd does sound like some rappers from the 80’s. While in the land of hip hop Run–D.M.C. was legendary in creating new school rap, Docker has thrown it’s interia behind runC and containerD to pave the way for future success. runC is an implementation of the Open Container Initiative (OCI) spec which Docker has donated a huge chunk of their own work to the project. runC is a standalone binary that allows you run a single OCI container. This is big because now everyone has a standard way to run a container which creates better portability and creates good code hygiene.

    containerD is a new piece of infrastructure plumbing that allows you to run multiple containers using runC. It’s kinda like a simple init system. containterD takes care of the simple CRUD operations against containers but image management still lives with the Docker Engine. containerD is also event driven so you can build untop of it.

    2016-04-22_23-31-30

    With the release of Docker 1.11 runC and contianerD is fully integrated. I think this important because if your going to pick a horse in the container race you have a company in Docker that is leading with committers for OCI which is essentially helping to set direction for containers. On the operations side of the house if I have to upgrade the Docker Engine, there is now a road map to have an upgrade without affecting your running containers. It’s great containers can run and die but it’s even better if they never fail :-)

    Docker 1.11 also added DNS round robin load balancing. While may it seems crude to the likes of a F5 or Netscaler engineer I always find simple wins and see it used in lots of places. If you give multiple containers the same alias, Docker’s service discovery will return the addresses of all of the containers for round-robin DNS.

    I think the the 1.11 release of Docker will continue to build great things. Let’s just hope it doesn’t lead to over played Run–D.M.C spoof shirts.

    Apr
    17

    Quickly Pin Your Virtual Hard Drive To Flash #vExpert #NTC

    If you need to ensure performance with Flash Mode here is a quick way to get your job done.

    Find the disk UUID
    ncli virtual-disk ls | grep -B 3 -A 6

    pin-flash1

    Example
    ncli virtual-disk ls | grep m1_8 -B 3 -A 6

    Virtual Disk Id : 00052faf-34c2-58fc-64dd-0cc47a673b8c::313a49:6000C29b-93c9-bfe1-58d9-e718993e5a06
    Virtual Disk Uuid : 1dc11a7f-63ac-422a-ac27-442d5fcfc91a
    Virtual Disk Path : /hdfs/cdh-m1/cdh-m1_8.vmdk
    Attached VM Name : cdh-m1
    Cluster Uuid : 00052faf-34c2-58fc-64dd-0cc47a673b8c
    Virtual Disk Capacity : 268435456000
    Pinning Enabled : Flase

    Set 25 GB to pin to flash of the vdisk
    ncli virtual-disk update-pinning id=00052faf-34c2-58fc-64dd-0cc47a673b8c::313a49:6000C29b-93c9-bfe1-58d9-e718993e5a06 pinned-space=25 tier-name=SSD-SATA

    Pinned Space is in GB.

    In this case I was pinning a Hadoop NameNode directories to flash because I wanted to include their physical node in the cluster to help with replication traffic.