Jan
    05

    Nutanix AFS – Domain Activation

    Well if it’s not DNS stealing hours of your life, the next thing to make your partner angry as you miss family supper is Active Directory(AD). In more complex AD setups you may find your self going to the command line to attach your AFS instance to AD.

    Some important requirements to remember:

      While a deployment could fail due to AD, the FSVM(file server VMs) still get deployed. You can do the join domain process from the UI or NCLI afterwards.

      joindoamin

      The user attaching to the domain must be a domain admin or have similar rights. Why? The join domain process will create 1 computer account in the default Computers OU and create A service principal name (SPN) for DNS. If you don’t use the default Computers OU you will have to use the organizational-unit option from NCLI to change it to the appropriate OU. The computer account can be created in a specified container by using a forward slash mark to denote hierarchies (for example, organizational_unit/inner_organizational_unit).

      example

      stayoutad

      Command was

      ncli> fs join-domain uuid=d9c78493-d0f6-4645-848e-234a6ef31acc organizational-unit="stayout/afs" windows-ad-domain-name=tenanta.com preferred-domain-controller=tenanta-dc01.tenanta.com windows-ad-username=bob windows-ad-password=dfld#ld(3&jkflJJddu

      AFS needs at least 1 writable DC to complete the domain join. After the domain join is can authenticate using a local read only DC. Timing (latency) may cause problems here. To pick an individual DC you can use preferred-domain-controller from the NCLI.

    NCLI Join-Domain Options

    Entity:
    file-server | fs : Minerva file server

    Action:
    join-domain : Join the File Server to the Windows AD domain specified.

    Required Argument(s):
    uuid : UUID of the FileServer
    windows-ad-domain-name : The windows AD domain the file server is
    associated with.
    windows-ad-username : The name of a user account with administrative
    privileges in the AD domain the file server is associated with.
    windows-ad-password : The password for the above Windows AD account

    Optional Argument(s):
    organizational-unit : An Organizational unit container is where the AFS
    machine account will be created as part of domain join
    operation. Default container OU is "computers". Examples:
    Engineering, Department/Engineering.
    overwrite : Overwrite the AD user account.
    preferred-domain-controller : Preferred domain controller to use for
    all join-domain operations.

    NOTE: preferred-domain-controller needs to be FQDN

    If you need to do further troubleshooting you can ssh into one of the FSVMs and run

    afs get_leader

    Then navigate to the /data/logs and look at the minerva logs.

    Shouldn't be an issue in most environments but I've included used ports just in case.


    Required AD Permissions

    Delegating permissions in an Active Directory (AD) enables the administrator to assign permissions in the directory to unprivileged domain users. For example, to enable a regular user to join machines to the domain without knowing the domain administrator credentials.

    Adding the Delegation
    ---------------------
    To enable a user to join and remove machines to and from the domain:
    - Open the Active Directory Users and Computers (ADUC) console as domain administrator.
    - Right-click to the CN=Computer container (or desired alternate OU) and select "Delegate control".
    - Click "Next".
    - Click "Add" and select the required user and click "Next".
    - Select "Create a custom task to delegate".
    - Select "Only the following objects in the folder" and check "Computer objects" from the list.
    - Additionally select the options "Create selected objects in the folder" and "Delete selected objects in this folder". Click "Next".
    - Select "General" and "Property-specific", select the following permissions from the list:
    - Reset password
    - Read and write account restrictions
    - Read and write DNS host name attributes
    - Validated write to DNS host name
    - Validated write to service principal name
    - Write servicePrincipalName
    - Write Operating System
    - Write Operating System Version
    - Write OperatingSystemServicePack
    - Click "Next".
    - Click "Finish".
    After that, wait for AD replication to finish and then the delegated user can use its credentials to join AFS to a domain.


    Domain Port Requirements

    The following services and ports are used by AFS file server for Active Directory communication.

    UDP and TCP Port 88
    Forest level trust authentication for Kerberos
    UDP and TCP Port 53
    DNS from client to domain controller and domain controller to domain controller
    UDP and TCP Port 389
    LDAP to handle normal queries from client computers to the domain controllers
    UDP and TCP Port 123
    NTP traffic for the Windows Time Service
    UDP and TCP Port 464
    Kerberos Password Change for replication, user and computer authentication, and trusts
    UDP and TCP Port 3268 and 3269
    Global Catalog from client to domain controllers
    UDP and TCP Port 445
    SMB protocol for file replication
    UDP and TCP Port 135
    Port-mapper for RPC communication
    UDP and TCP High Ports
    Randomly allocated TCP high ports for RPC from ports 49152 to ports 65535

      Dec
      20

      Why losing a disk on Nutanix is no big deal (*No Humans Required)

      When Acropolis DFS detects an accumulation of errors for a particular disk (e.g., I/O errors or bad sectors) it is the Hades service running the Controller VM. The purpose of Hades is to simplify the break-fix procedures for disks and to automate several tasks that previously required manual user actions. Hades aids in fixing failing devices before the device become unrecoverable.

      Nutanix has a unified component called Stargate that manages the responsibility of receiving and processing data. All read and write requests are sent to the Stargate process running on that node. Once Stargate sees delays in responses to I/O requests to a disk, it marks a disk offline. Hades then automatically removes the disk from the data path and runs smartctl checks against it. If the checks pass, Hades then automatically marks the disk online and returns it to service. If Hades’ smartctl checks fail, or if Stargate marks a disk offline three times within one hour (regardless of the smartctl check results), Hades automatically removes the disk from the cluster, and following occurs:

      • The disk is marked for removal within the cluster Zeus configuration.
      • This disk is unmounted.
      • The Red LED of the disk is turned on to provide a visual indication of the failure.
      • The cluster automatically begins to create new replicas of any data that is stored on the disk.

      The failed disk is marked as a tombstoned Disk to prevent it from being used again without manual intervention.

      When disk is marked offline, an alert is triggered, and is immediately removed from the storage pool by the system. Curator then identifies all extents stored on the failed disk, and Acropolis DSF is then prompted to re-replicate copies of the associated replicas to restore the desired replication factor. By the time the Nutanix administrators become aware of the disk failure via Prism, SNMP trap, or email notification, Acropolis DSF will be well on its way to healing the cluster.

      Acropolis DSF data rebuild architecture provides faster rebuild times and no performance impact to workloads supported by the Nutanix cluster when compared to traditional RAID data protection schemes. RAID groups or sets typically comprise a small number of drives. When a RAID set performs a rebuild operation, typically one disk is selected to be the rebuild target. The other disks that comprise the RAID set must divert enough resources to quickly rebuild the data on the failed disk. This can lead to performance penalties for workloads served by the degraded RAID set. Acropolis DSF can distribute remote copies found on any individual disk among the remaining disks in the Nutanix cluster. Therefore Acropolis DSF replication operations can happen as background processes with no impact to cluster operations or performance. Acropolis DSF can access all disks in the cluster at any given time as a single, unified pool of storage resources. This architecture provides a very advantageous consequence. As the cluster size grows, the length of time needed to recover from a disk failure decreases as every node in the cluster participates in the replication. Since the data needed to rebuild a disk is distributed throughout the cluster, more disks are involved in the rebuild process. This increases the speed at which the affected extents are re-replicated.

      It’s important to note that Nutanix also keeps the performance consistent during the rebuild operations. For hybrid systems Nutanix rebuilds cold data to cold data so large hard drives do not flood the cache of the SSD’s. For all flash systems Nutanix has quality of service implemented for backend I/O to prevent user I/O from being impacted.

      In addition to a many-to-many rebuild approach to data availability, the Acropolis DFS data rebuild architecture ensures that all healthy disks are available for use all of the time. Unlike most traditional storage arrays, there’s no need for “hot-spare” or standby drives in a Nutanix cluster. Since data can be rebuilt to any of the remaining healthy disks, reserving physical resources for failures is unnecessary. Once healed, you can lose the next drive/node.
      8e792111719423-560fc3039a4c6

      Dec
      19

      THE WORD FROM GOSTEV – 3rd Party Backups aren’t going away.

      First off the Veeam newsletter is great and you should sign up. There was one comment that I found interesting was regarding the need for backups. I’ve always said that while Nutanix has a great integrated backup story sometimes it doesn’t meet all of the requirements needed by a business. Getting it out of the storage vendor’s hands is a wise decision. While Nutanix and every other vendor does rigourous QA the fact remains is that were still human and problems can occur.

      Something like this has to happen once in a while so that everyone is reminded that storage snapshots are not backups – not even if you replicate them to a secondary array, like these folks did > HPE storage crash killed Australian Tax Office. You may still remember the same issue with EMC array crash disabling multiple Swedish agencies for 5 days not so long ago. These things just happen, this is why it is extremely important to make real backups by taking the production data out of the storage vendor’s “world” – whether we’re talking about classic storage architectures, or up and coming hyper-converged vendors (one of which have not been shy marketing < 5 min "backup" windows lately).

      Food for thought, in the end it will be what meets the needs of your business. AKA Can you live with the pain.

      Dec
      14

      AOS 5.0 – Adapt Not React – Performance

      In AOS 5.0 is Adaptive replica selection is intelligent data placement for the extent store. Rather than use a random selection placement decisions are based on this capacity and queue length, these metrics are used to create a weighted random selection. The current algorithm was great for spreading all of the work load around for fast rebuilds but could cause issues with heterogeneous clusters. With mixed clusters with different tiers size, CPU strength, and running various workloads could have some nodes could be taxed more than others. It also didn’t take in to account the need for rebuilding data if the affected nodes had heavy running workloads.

      This new algorithm can prevent weaker nodes from getting overburden and their hot tier from filling up and reduce the risk of having busy disks. It can also allow for lower utilized nodes to send their replicas to each other and allow busier nodes to have less replica traffic being delivered to them. If we take the example of our storage only nodes we can ensure that replicas will go to the storage only nodes while we’re not sending replicas to other computer-based nodes. This new algorithm also reduces the need to run auto balancing from a capacity perspective. By reducing the need to react we also reserve CPU cycles for workloads and save on wear and tear of the drives.
      In a rudimentary static placement systems this ability to have adaptive replicas would also solve the problem of moving data that then blows up your cache.

      The two less used nodes send their replication traffic to each other. The high-performing node is not impacted by incoming replica traffic.

      The two less used nodes send their replication traffic to each other. The high-performing node is not impacted by incoming replica traffic.

      Since we have a high performing NoSQL database collecting disk usage and performance stats for each disc we can use those stats to create a fitness value. If we can collect stats for a disc we assume the worst case and place a low number for the probability. If we can’t grab stats there is likely chance that something bad is happening to that disc. The disks once assigned a fitness value can be selected by a weighted random lottery to prevent some nodes taking all of the traffic.

      As the product continues to mature were trying to avoid problems from even happening. Whether VDI, Splunk, SAP, SharePoint, SQL your workloads can get very consistent high performance on top of data locality.

      The doctor says prevention is always the best medicine.

      Dec
      10

      Get Ready for AOS 5.0 – Nutanix

      This authentication behavior is changed in AOS 5.0. If you are using Active Directory, you must also assign roles to entities or users, especially before upgrading from a previous AOS version. If you’re not using AD, pass Go and collect $200!

      For customers upgrading their clusters to AOS 5.0:

      * Customers upgrading their clusters to AOS 5.0 will see a pre-upgrade check warning if user authentication is enabled for the Active Directory (AD) service and role permissions are not assigned to any user. The upgrade process will fail in this case.

      Warning - no role mappings

      Warning – no role mappings


      * The AOS 5.0 Prism service (part of the Prism web console) will not authenticate AD users if role permissions are not configured for those users. This situation effectively locks out existing AD users that previously were allowed to access the Prism 4.x web console and other components such as the Nutanix command line (nCLI).
      Add a Role mapping for your AG Groups or Users

      Add a Role mapping for your AG Groups or Users


      To upgrade successfully in this case and to maintain existing access, assign roles (role permissions) to entities that are allowed access to Prism before attempting to upgrade your cluster.

      Dec
      01

      Integrated Single Node Backup with Nutanix

      Integrated backup for remote branch offices and small to medium sized business. Single Node backup is using the NX-1155 which is quotable today . Single Node Backup is apart of AOS 5.0

      Nov
      14

      Docker Datacenter 2.0 for Virtual Admins

      Just a short video walking thru how easy it is to get an environment up and running with Docker Datacenter 2.0 on top of AHV.

      High level points:

      * If you can deploy an VM you can setup Docker Datacenter
      * Management of new docker hosts is easliy done with pre-generated code to paste into new hosts
      * Docker Datacenter has the ability to run both services and compose apps side by side in the same Docker Datacenter environment

      Later this week I hope to have a post talking about the integration with Docker Datacenter and the Docker trusted registry.

        Oct
        31

        Eliminate Standalone NAS & What’s new with Horizon 7

        Thought I would post the links to 2 new on-demand webinars. The Horizon 7 webinar has some Nutanix but mostly focused on Instant Clones, App Volumes and user impact.

        Horizon 7: New Features and How it Impacts User Experience

        The AFS webinar has some great questions and there is a demo at the end as well.

        Eliminate Standalone NAS for your file server needs with Nutanix Acropolis File Services

        Sep
        16

        Serve Files with Enterprise Cloud Agility, Security, and Availability with Acropolis File Services

        afs

        Nutanix continues on its Enterprise Cloud journey at the .NEXT On-Tour event in Bangkok, Thailand. Today, we are proud to announce that we are planning to support Acropolis File Services (AFS) on our storage only nodes, the NX-6035C-G5. Acropolis File Services provides a simple and scalable solution for hosting user and shared department files across a centralized location with a single namespace. With Acropolis File Services, administrators no longer waste time with manual configuration or need Active Directory and load balancing expertise. If and when released, this will make 6035C-G5 nodes even more versatile, adding to the current capabilities of serving as a backup or replication target and running Acropolis Block Services.

        [read more]

        Sep
        16

        Build Large File Services Repositories on Nutanix’s Largest Capacity Nodes, the NX-6035C-G5

        Nutanix continues on its Enterprise Cloud journey at the .NEXT On-Tour event in Bangkok, Thailand. Today, we are proud to announce that we are planning to support Acropolis File Services (AFS) on our storage only nodes, the NX-6035C-G5. Acropolis File Services provides a simple and scalable solution for hosting user and shared department files across a centralized location with a single namespace. With Acropolis File Services, administrators no longer waste time with manual configuration or need Active Directory and load balancing expertise. If and when released, this will make 6035C-G5 nodes even more versatile, adding to the current capabilities of serving as a backup or replication target and running Acropolis Block Services.

        [read more here]