Nov
    19

    Nutanix Additional Cluster Health Tooling: Panacea

    There are over 450 health checks in the Cluster Health UI inside of Prism Element. To provide additional help a new script called “panacea” had been added. Panacea is bundled with NCC 3.5 and later to provide a user-friendly interface for very advanced troubleshooting. The Nutanix Support team can take these logs and correlate results so you don’t have to wait for the problem to reoccur again before fixing the issue.

    The ability to quickly track retransmissions with a very low granularity for a distrusted system is very important. I am hoping in the future this new tooling will play into Nutanix’s ability for degraded node detection. Panacea can be ran for a specific time interval during which logs will be analyzed, possible options are:
    –last_no_of_hours
    –last_no_of_days
    –start_time
    –end_time

    Login to any CVM within the cluster and the command can be ran from home/nutanix/ncc/panacea/

    The below output is from using the tool when digging for network information.

    Network outage can cause degraded performance. Cluster network outage
    detection is based on following schemes:
    1) Cassandra Paxos Request Timeout Exceptions/Message Drops
    2) CVM Degraded node scoring
    3) Ping latency

    In some cases, intermittent network issue might NOT be reflected in ping latency, but it does have impact on TCP throughput and packet
    retransmission, leading to more request timeout exceptions.

    TCP Retransmission:
    ——————-
    By default, Panacea tracks the TCP connections(destination port 7000) used by Cassandra between peer CVMs. This table displays stats of
    packet Retransmissions per min in TCP socket. Frequent retransmission could cause delay in application, and may reflect the congestion status on the host or in the network.
    1) Local: Local CVM IP address
    2) Remote: Remote CVM IP address
    3) Max/Mean/Min/STD: number of retransmissions/min, calcuated from
    samples where retransmission happened.
    4) %: Value distribution, % of samples is less than the value
    = 25, 50, and 75
    5) Ratio: N/M, N = number of samples where retransmission happened
    M = total samples in the entire data set

    +————–+————–+——-+——+——+——+——+——+——+———+
    | Local | Remote | Max | Mean | Min | STD | 25% | 50% | 75% | Ratio |
    +————–+————–+——-+——+——+——+——+——+——+———+
    | XX.X.XXX.110 | XX.X.XXX.109 | 19.00 | 1.61 | 1.00 | 1.90 | 1.00 | 1.00 | 2.00 | 133/279 |
    | XX.X.XXX.111 | XX.X.XXX.109 | 11.00 | 2.41 | 1.00 | 1.54 | 1.00 | 2.00 | 3.00 | 236/280 |
    | XX.X.XXX.112 | XX.X.XXX.109 | 12.00 | 2.40 | 1.00 | 1.59 | 1.00 | 2.00 | 3.00 | 235/279 |
    | XX.X.XXX.109 | XX.X.XXX.110 | 32.00 | 3.04 | 1.00 | 2.70 | 1.00 | 2.00 | 4.00 | 252/279 |
    | XX.X.XXX.111 | XX.X.XXX.110 | 9.00 | 1.51 | 1.00 | 1.02 | 1.00 | 1.00 | 2.00 | 152/280 |
    | XX.X.XXX.112 | XX.X.XXX.110 | 11.00 | 2.21 | 1.00 | 1.31 | 1.00 | 2.00 | 3.00 | 231/279 |
    | XX.X.XXX.109 | XX.X.XXX.111 | 9.00 | 2.01 | 1.00 | 1.20 | 1.00 | 2.00 | 2.00 | 202/279 |
    | XX.X.XXX.110 | XX.X.XXX.111 | 10.00 | 2.70 | 1.00 | 1.68 | 1.00 | 2.00 | 3.00 | 244/279 |
    | XX.X.XXX.112 | XX.X.XXX.111 | 4.00 | 1.46 | 1.00 | 0.76 | 1.00 | 1.00 | 2.00 | 135/279 |
    | XX.X.XXX.109 | XX.X.XXX.112 | 5.00 | 1.56 | 1.00 | 0.85 | 1.00 | 1.00 | 2.00 | 150/279 |
    | XX.X.XXX.110 | XX.X.XXX.112 | 6.00 | 2.05 | 1.00 | 1.18 | 1.00 | 2.00 | 3.00 | 234/279 |
    | XX.X.XXX.111 | XX.X.XXX.112 | 16.00 | 3.26 | 1.00 | 2.24 | 2.00 | 3.00 | 4.00 | 261/280 |
    +————–+————–+——-+——+——+——+——+——+——+———+

    Most of the 450 Cluster Health checks inside of Prism with automatic alerting

    CVM | CPU
    CPU Utilization

    Load Level

    Node Avg Load – Critical

    CVM | Disk
    Boot RAID Health

    Disk Configuration

    Disk Diagnostic Status

    Disk Metadata Usage

    Disk Offline Status

    HDD Disk Usage

    HDD I/O Latency

    HDD S.M.A.R.T Health Status

    Metadata Disk Mounted Check

    Metro Vstore Mount Status

    Non SED Disk Inserted Check

    Nutanix System Partitions Usage High

    Password Protected Disk Status

    Physical Disk Remove Check

    Physical Disk Status

    SED Operation Status

    SSD I/O Latency

    CVM | Hardware
    Agent VM Restoration

    FT2 Configuration

    Host Evacuation Status

    Node Status

    VM HA Healing Status

    VM HA Status

    VMs Restart Status

    CVM | Memory
    CVM Memory Pinned Check

    CVM Memory Usage

    Kernel Memory Usage

    CVM | Network
    CVM IP Address Configuration

    CVM NTP Time Synchronization

    Duplicate Remote Cluster ID Check

    Host IP Pingable

    IP Configuration

    SMTP Configuration

    Subnet Configuration

    Virtual IP Configuration

    vCenter Connection Check

    CVM | Protection Domain
    Entities Restored Check

    Restored Entities Protected

    CVM | Services
    Admin User API Authentication Check

    CVM Rebooted Check

    CVM Services Status

    Cassandra Waiting For Disk Replacement

    Certificate Creation Status

    Cluster In Override Mode

    Cluster In Read-Only Mode

    Curator Job Status

    Curator Scan Status

    Kerberos Clock Skew Status

    Metadata Drive AutoAdd Disabled Check

    Metadata Drive Detached Check

    Metadata Drive Failed Check

    Metadata Drive Ring Check

    Metadata DynRingChangeOp Slow Check

    Metadata DynRingChangeOp Status

    Metadata Imbalance Check

    Metadata Size

    Node Degradation Status

    RemoteSiteHighLatency

    Stargate Responsive

    Stargate Status

    Upgrade Bundle Available

    CVM | Storage Capacity
    Compression Status

    Finger Printing Status

    Metadata Usage

    NFS Metadata Size Overshoot

    On-Disk Dedup Status

    Space Reservation Status

    vDisk Block Map Usage

    vDisk Block Map Usage Warning

    Cluster | CPU
    CPU type on chassis check

    Cluster | Disk
    CVM startup dependency check

    Disk online check

    Duplicate disk id check

    Flash Mode Configuration

    Flash Mode Enabled VM Power Status

    Flash Mode Usage

    Incomplete disk removal

    Storage Pool Flash Mode Configuration

    System Defined Flash Mode Usage Limit

    Cluster | Hardware
    Power Supply Status

    Cluster | Network
    CVM Passwordless Connectivity Check

    CVM to CVM Connectivity

    Duplicate CVM IP check

    NIC driver and firmware version check

    Time Drift

    Cluster | Protection Domain
    Duplicate VM names

    Internal Consistency Groups Check

    Linked Clones in high frequency snapshot schedule

    SSD Snapshot reserve space check

    Snapshot file location check

    Cluster | Remote Site
    Cloud Remote Alert

    Remote Site virtual external IP(VIP)

    Cluster | Services
    AWS Instance Check

    AWS Instance Type Check

    Acropolis Dynamic Scheduler Status

    Alert Manager Service Check

    Automatic Dedup disabled check

    Automatic disabling of Deduplication

    Backup snapshots on metro secondary check

    CPS Deployment Evaluation Mode

    CVM same timezone check

    CVM virtual hardware version check

    Cassandra Similar Token check

    Cassandra metadata balanced across CVMs

    Cassandra nodes up

    Cassandra service status check

    Cassandra tokens consistent

    Check that cluster virtual IP address is part of cluster external subnet

    Checkpoint snapshot on Metro configured Protection Domain

    Cloud Gflags Check

    Cloud Remote Version Check

    Cloud remote check

    Cluster NCC version check

    Cluster version check

    Compression disabled check

    Curator scan time elapsed check

    Datastore VM Count Check

    E-mail alerts check

    E-mail alerts contacts configuration

    HTTP proxy check

    Hardware configuration validation

    High disk space usage

    Hypervisor version check

    LDAP configuration

    Linked clones on Dedup check

    Multiple vCenter Servers Discovered

    NGT CA Setup Check

    Oplog episodes check

    Pulse configuration

    RPO script validation on storage heavy cluster

    Remote Support Status

    Report Generation Failure

    Report Quota Scan Failure

    Send Report Through E-mail Failure

    Snapshot chain height check

    Snapshots space utilization status

    Storage Pool SSD tier usage

    Stretch Connectivity Lost

    VM group Snapshot and Current Mismatch

    Zookeeper active on all CVMs

    Zookeeper fault tolerance check

    Zookeeper nodes distributed in multi-block cluster

    vDisk Count Check

    Cluster | Storage Capacity
    Erasure Code Configuration

    Erasure Code Garbage

    Erasure coding pending check

    Erasure-Code-Delay Configuration

    High Space Usage on Storage Container

    Storage Container RF Status

    Storage Container Space Usage

    StoragePool Space Usage

    Volume Group Space Usage

    Data Protection | Protection Domain
    Aged Third-party Backup Snapshot Check

    Check VHDX Disks

    Clone Age Check

    Clone Count Check

    Consistency Group Configuration

    Cross Hypervisor NGT Installation Check

    EntityRestoreAbort

    External iSCSI Attachments Not Snapshotted

    Failed To Mount NGT ISO On Recovery of VM

    Failed To Recover NGT Information

    Failed To Recover NGT Information for VM

    Failed To Snapshot Entities

    Incorrect Cluster Information in Remote Site

    Metadata Volume Snapshot Persistent

    Metadata Volume Snapshot Status

    Metro Availability

    Metro Availability Prechecks Failed

    Metro Availability Secondary PD sync check

    Metro Old Primary Site Hosting VMs

    Metro Protection domain VMs running at Sub-optimal performance

    Metro Vstore Symlinks Check

    Metro/Vstore Consistency Group File Count Check

    Metro/Vstore Protection Domain File Count Check

    NGT Configuration

    PD Active

    PD Change Mode Status

    PD Full Replication Status

    PD Replication Expiry Status

    PD Replication Skipped Status

    PD Snapshot Retrieval

    PD Snapshot Status

    PD VM Action Status

    PD VM Registration Status

    Protected VM CBR Capablity

    Protected VM Not Found

    Protected VMs Not Found

    Protected VMs Storage Configuration

    Protected Volume Group Not Found

    Protected Volume Groups Not Found

    Protection Domain Decoupled Status

    Protection Domain Initial Replication Pending to Remote Site

    Protection Domain Replication Stuck

    Protection Domain Snapshots Delayed

    Protection Domain Snapshots Queued for Replication to Remote Site

    Protection Domain VM Count Check

    Protection Domain fallback to lower frequency replications to remote

    Protection Domain transitioning to higher frequency snapshot schedule

    Protection Domain transitioning to lower frequency snapshot schedule

    Protection Domains sharing VMs

    Related Entity Protection Status

    Remote Site NGT Support

    Remote Site Snapshot Replication Status

    Remote Stargate Version Check

    Replication Of Deduped Entity

    Self service restore operation Failed

    Snapshot Crash Consistent

    Snapshot Symlink Check

    Storage Container Mount

    Updating Metro Failure Handling Failed

    Updating Metro Failure Handling Remote Failed

    VM Registration Failure

    VM Registration Warning

    VSS Scripts Not Installed

    VSS Snapshot Status

    VSS VM Reachable

    VStore Snapshot Status

    Volume Group Action Status

    Volume Group Attachments Not Restored

    Vstore Replication To Backup Only Remote

    Data Protection | Remote Site
    Automatic Promote Metro Availability

    Cloud Remote Operation Failure

    Cloud Remote Site failed to start

    LWS store allocation in remote too long

    Manual Break Metro Availability

    Manual Promote Metro Availability

    Metro Connectivity

    Remote Site Health

    Remote Site Network Configuration

    Remote Site Network Mapping Configuration

    Remote Site Operation Mode ReadOnly

    Remote Site Tunnel Status

    Data Protection | Witness
    Authentication Failed in Witness

    Witness Not Configured

    Witness Not Reachable

    File server | Host
    File Server Upgrade Task Stuck Check

    File Server VM Status

    Multiple File Server Versions Check

    File server | Network
    File Server Entities Not Protected

    File Server Invalid Snapshot Warning

    File Server Network Reachable

    File Server PD Active On Multiple Sites

    File Server Reachable

    File Server Status

    Remote Site Not File Server Capable

    File server | Services
    Failed to add one or more file server admin users or groups

    File Server AntiVirus – All ICAP Servers Down

    File Server AntiVirus – Excessive Quarantined / Unquarantined Files

    File Server AntiVirus – ICAP Server Down

    File Server AntiVirus – Quarantined / Unquarantined Files Limit Reached

    File Server AntiVirus – Scan Queue Full on FSVM

    File Server AntiVirus – Scan Queue Piling Up on FSVM

    File Server Clone – Snapshot invalid

    File Server Clone Failed

    File Server Rename Failed

    Maximum connections limit reached on a file server VM

    Skipped File Server Compatibility Check

    File server | Storage Capacity
    FSVM Time Drift Status

    Failed To Run File Server Metadata Fixer Successfully

    Failed To Set VM-to-VM Anti Affinity Rule

    File Server AD Connectivity Failure

    File Server Activation Failed

    File Server CVM IP update failed

    File Server DNS Updates Pending

    File Server Home Share Creation Failed

    File Server In Heterogeneous State

    File Server Iscsi Discovery Failure

    File Server Join Domain Status

    File Server Network Change Failed

    File Server Node Join Domain Status

    File Server Performance Optimization Recommended

    File Server Quota allocation failed for user

    File Server Scale-out Status

    File Server Share Deletion Failed

    File Server Site Not Found

    File Server Space Usage

    File Server Space Usage Critical

    File Server Storage Cleanup Failure

    File Server Storage Status

    File Server Unavailable Check

    File Server Upgrade Failed

    Incompatible File Server Activation

    Share Utilization Reached Configured Limit

    Host | CPU
    CPU Utilization

    Host | Disk
    All-flash Node Intermixed Check

    Host disk usage high

    NVMe Status Check

    SATA DOM 3ME Date and Firmware Status

    SATA DOM Guest VM Check

    SATADOM Connection Status

    SATADOM Status

    SATADOM Wearout Status

    SATADOM-SL 3IE3 Wearout Status

    Samsung PM1633 FW Version

    Samsung PM1633 Version Compatibility

    Samsung PM1633 Wearout Status

    Samsung PM863a config check

    Toshiba PM3 Status

    Toshiba PM4 Config

    Toshiba PM4 FW Version

    Toshiba PM4 Status

    Toshiba PM4 Version Compatibility

    Host | Hardware
    CPU Temperature Fetch

    CPU Temperature High

    CPU Voltage

    CPU-VRM Temperature

    Correctable ECC Errors 10 Days

    Correctable ECC Errors One Day

    DIMM Voltage

    DIMM temperature high

    DIMM-VRM Temperature

    Fan Speed High

    Fan Speed Low

    GPU Status

    GPU Temperature High

    Hardware Clock Status

    IPMI SDR Status

    SAS Connectivity

    System temperature high

    Host | Memory
    Memory Swap Rate

    Ram Fault Status

    Host | Network
    10 GbE Compliance

    Hypervisor IP Address Configuration

    IPMI IP Address Configuration

    Mellanox NIC Mixed Family check

    Mellanox NIC Status check

    NIC Flapping Check

    NIC Link Down

    Node NIC Error Rate High

    Receive Packet Loss

    Transmit Packet Loss

    Host | Services
    Datastore Remount Status

    Node | Disk
    Boot device connection check

    Boot device status check

    Descriptors to deleted files check

    FusionIO PCIE-SSD: ECC errors check

    Intel Drive: ECC errors

    Intel SSD Configuration

    LSI Disk controller firmware status

    M.2 Boot Disk change check

    M.2 Intel S3520 host boot drive status check

    M.2 Micron5100 host boot drive status check

    SATA controller

    SSD Firmware Check

    Samsung PM863a FW version check

    Samsung PM863a status check

    Samsung PM863a version compatibility check

    Samsung SM863 SSD status check

    Samsung SM863a version compatibility check

    Node | Hardware
    IPMI connectivity check

    IPMI sel assertions check

    IPMI sel log fetch check

    IPMI sel power failure check

    IPMI sensor values check

    M10 GPU check

    M10 and M60 GPU Mixed check

    M60 GPU check

    Node | Network
    CVM 10 GB uplink check

    Inter-CVM connectivity check

    NTP configuration check

    Storage routed to alternate CVM check

    Node | Protection Domain
    ESX VM Virtual Hardware Version Compatible

    Node | Services
    .dvsData directory in local datastore

    Advanced Encryption Standard (AES) enabled

    Autobackup check

    BMC BIOS version check

    CVM memory check

    CVM port group renamed

    Cassandra Keyspace/Column family check

    Cassandra memory usage

    Cassandra service restarts check

    Cluster Services Down Check

    DIMM Config Check

    DIMMs Interoperability Check

    Deduplication efficiency check

    Degraded Node check

    Detected VMs with non local data

    EOF check

    ESXi AHCI Driver version check

    ESXi APD handling check

    ESXi CPU model and UVM EVC mode check

    ESXi Driver compatibility check

    ESXi NFS hearbeat timeout check

    ESXi RAM disk full check

    ESXi RAM disk root usage

    ESXi Scratch Configuration

    ESXi TCP delayed ACK check

    ESXi VAAI plugin enabled

    ESXi VAAI plugin installed

    ESXi configured VMK check

    ESXi services check

    ESXi version compatibility

    File permissions check

    Files in a streched VMs should be in the same Storage Container

    GPU drivers installed

    Garbage egroups check

    Host passwordless SSH

    Ivy Bridge performance check

    Mellanox NIC Driver version check

    NFS file count check

    NSC(Nutanix Service Center) server FQDN resolution

    NTP server FQDN resolution

    Network adapter setting check

    Non default gflags check

    Notifications dropped check

    PYNFS dependency check

    RC local script exit statement present

    Remote syslog server check

    SMTP server FQDN resolution

    Sanity check on local.sh

    VM IDE bus check

    VMKNICs subnets check

    VMware hostd service check

    Virtual IP check

    Zookeeper Alias Check

    localcli check

    vim command check

    Nutanix Guest Tools | VM
    PostThaw Script Execution Failed

    Other Checks
    LWS Store Full

    LWS store allocation too long

    Recovery Point Objective Cannot Be Met

    VM | CPU
    CPU Utilization

    VM | Disk
    I/O Latency

    Orphan VM Snapshot Check

    VM | Memory
    Memory Pressure

    Memory Swap Rate

    VM | Network
    Memory Usage

    Receive Packet Loss

    Transmit Packet Loss

    VM | Nutanix Guest Tools
    Disk Configuration Update Failed

    VM Guest Power Op Failed

    iSCSI Configuration Failed

    VM | Remote Site
    VM Virtual Hardware Version Compatible

    VM | Services
    VM Action Status

    VM | Virtual Machine
    Application Consistent Snapshot Skipped

    NGT Mount Failure

    NGT Version Incompatible

    Temporary Hypervisor Snapshot Cleanup Failed

    VSS Snapshot Aborted

    VSS Snapshot Not Supported

    host | Network
    Hypervisor time synchronized

    Nov
    13

    NetBackup Got Upgraded: Modern Workloads use Parallel Streaming.

    A new capability in NetBackup 8.1 is its ability to protect modern web-scale and big data workloads like Hadoop and NoSQL that generate massive amounts of data. With the new Veritas Parallel Streaming technology, these modern scale-out workloads can be backed up and protected with extreme efficiency by leveraging the power of multiple nodes simultaneously. The result is that organizations can now adopt these modern workloads with confidence, knowing that their data, even in massive volumes, will be protected. And since new workloads can be added via a plug-in rather than a software agent, organizations can add new workloads without having to wait for a next NetBackup software release. NetBackup Parallel Streaming also supports workloads running on hyper-converged infrastructure (HCI) from Nutanix, as Nutanix and Veritas have partnered to certify protection of those workloads on HCI.

    source


    Netbackup takes care of utilizing all of the Nutanix storage controllers. NetBackup will mount using NFS of the local storage controller for each node removing the need for a proxy host.

    For some general best practices with the first release of this plugin:

      * Apply limit to backup maximum n*4 VMs concurrently, where n is number of nodes in the Nutanix cluster. 16 node cluster than would have 64 VMs being backed up concurrently.

      * Use Media Server as Backup Host if possible.

    Note: If VMs are powered off, then Prim element VIP will be used.

    Oct
    27

    Mounting and Enabling NGT Results in an Error Message….. CD/DVD-ROM Drive

    Nutanix has a KB ARTICLE 3455

    Mounting and Enabling NGT Results in an Error Message that Indicates that the VM does not have a CD/DVD-ROM Drive

    If you enable Nutanix Guest Tools (NGT) in the Prism web console, the following error message is displayed.

    This VM does not have a CD/DVD-ROM device.
    OR
    Guest Tools cannot be mounted as there not enough empty CD-ROM(s)

    This error message is displayed even though the VM has the CD/DVD device.

    You can go ahead and read the KB but its caused by newer VMware versions using a SATA controller instead IDE for the CD ROM. On my VM it kept switching back to SATA from IDE. I got around it by adding a 2nd CD-ROM that was IDE.

    Oct
    17

    My Thoughts On The Kubernetes Support For Docker EE

    Oct
    02

    Automatically Snap, Clone and Backup AFS (Acropolis File Services)

    I wrote a script on the Next community site that automatically snaps, clones and then you can use any backup product that can read off a SMB share. The script can be used to always have the latest backup copy and avoid impacting your production users.

    Automatically https://next.nutanix.com/t5/Nutanix-Connect-Blog/24-Hour-Backup-Window-with-Nutanix-Native-File-Services/ba-p/23708

    Hope you find it useful.

    Sep
    25

    Maximize Your ROI and Agility with Big Data #Nutanix #Docker #BlueData

    Separate out your data from your compute for more agility.

    The datanode is what is used to build out the HDFS. Typically the the dataNode and the nodeManager are co-located on the same host whether its physical or virtual. The NodeManager is responsible for launching and managing containers that are scheduled from the Resource Manager. On Nutanix if you virtualize the dataNode and the nodeManager on separate virtual machines you have the opportunity to increase your agility. The agility comes from the ability to use your resources to the max of your capacity at all times. When the the cluster isn’t in use or as busy, other systems have the opportunity to use the resources. You can shut down the NodeManager since they’re not responsible for persisting data and make the the CPU and memory available for another project like Spark or maybe a new machine-learning program someone wants to test out.

    Hold the phone! What about data locality? You are correct performance is going to take a hit. Performance may drop from up to 15% from the standard way but if your system is only busy 30% of time it might be more that worth it. Let’s say a job takes 60 minutes to complete. Using this new model of separating out compute and storage, the job may now take 70 minutes to complete. Is the extra 10 minutes worth the agility to use your hardware for other projects? I think so but that is going to depend on your business requirements of course.

    On the data locality side, the datenode still gets to benefit from reading locally. It’s data path on the network isn’t going to cause more stress so that’s a plus. Also the nodeManager is busy writing all of the temporary and shuffle data locally so that is also not going to cause any additional stress compared to having the nodeManager write to a remote shared storage device. Also in some cases the NodeManager will still talk to the local datanode over the local hypervisor switch.

    If your after some real flexibility you could look at using BlueData to run Docker containers along side the dataNodes. BlueData will take over for the nodeManager essentially. Install some CentOS VMs that fit inside the hosts NUMA node and install BlueData. BlueData can help with QofS for different tenants, allow you to run different versions of Hadoop distros, Spark, Kafka and son on without blowing out your data requirements. BlueData also helps to maximize the remote connection between the containers and HDFS distro of choice.

    If your after more agility, avoiding separate hardware for projects, getting better ROI for systems that run only weekly, monthly, quarterly or better testing methodologies this may be the right architecture for you to try out.

    Sep
    15

    HYCU v1.5 – Nutanix AHV Backup gets new features

    HYCU v1.5 has been released by Comtrade.

    The biggest one for me is ABS support! Know you can use cheap and deep storage and drive all of the storage controllers. Remember that ADS works with ABS so its a great solution.

    The following new features and enhancements are available with Hycu version 1.5.0:

    Backup window
    It is now possible to set up a time frame when your backup jobs are allowed to run (a backup window). For example, this allows you to schedule your backup jobs to run on non-production hours to reduce loads during peak hours.

    Concurrent backups
    You can now specify the maximum number of concurrent backup jobs per target. By doing so, you can reduce the duration of backups and the amount of queued backup jobs.

    Faster recovery
    Hycu enables you to keep snapshots from multiple backups on the Nutanix cluster. Keeping multiple snapshots allows you to recover a virtual machine or an application quickly, reducing downtime. If your know Commvault Intelisnap, very similar benefits

    iSCSI backup target
    A new type of backup target for storing the protected data is available—iSCSI, which also makes it possible for you to use a Nutanix volume group as a backup target. You can use the iSCSI backup target for backing up the Hycu backup controller as well.

    Improved backup and restore performance
    Hycu now utilizes Nutanix volume groups for backing up and restoring data, taking advantage of the load balancing feature offered by Nutanix. Therefore, the new version of Hycu can distribute the workload between several nodes, which results in increased
    performance of your backup and restore operations, and reduced I/O load on the Nutanix cluster and containers.

    Support for AWS S3-compatible storage
    Hycu enables you to store your protected data to AWS S3-compatible storage.

    Shared location for restoring individual files
    You can now restore individual files to a shared location so that recovered data can be accessed from multiple systems.

    Support for Active Directory applications
    In addition to SQL Server applications, Hycu can now also detect and protect Active Directory applications running on virtual machines. You can view all the discovered applications in the Applications panel.

    Expiring backups manually

    If there is a restore point that you do not want to use for a data restore anymore, you can mark it as expired. Expired backups are removed from the backup target within the next 24 hours, resulting in more free storage space and helping you to keep your Hycu system clean.

    Support for Nutanix API changes

    Hycu supports Nutanix API changes introduced with AOS 5.1.1.1

    Sep
    14

    AOS 5.1.2 Security Updates

    A long list of updates, one-click upgrade yourself to safety.

    CVE-2017-1000364 kernel: heap or stack gap jumping occurs through unbounded stack allocations ( Stack Guard or Stack Clash)

    CVE-2017-1000366 glibc: heap or stack gap jumping occurs through unbounded stack allocations (Stack Guard or Stack Clash)

    CVE-2017-2628 curl: negotiate not treated as connection-oriented

    CVE-2017-3509 OpenJDK: improper re-use of NTLM authenticated connections (Networking, 8163520)

    CVE-2017-3511 OpenJDK: untrusted extension directories search path in Launcher (JCE, 8163528)

    CVE-2017-3526 OpenJDK: incomplete XML parse tree size enforcement (JAXP, 8169011)

    CVE-2017-3533 OpenJDK: newline injection in the FTP client (Networking, 8170222)

    CVE-2017-3539 OpenJDK: MD5 allowed for jar verification (Security, 8171121)

    CVE-2017-3544 OpenJDK: newline injection in the SMTP client (Networking, 8171533)

    CVE-2016-0736 httpd: Padding Oracle in Apache mod_session_crypto

    CVE-2016-1546 httpd: mod_http2 denial-of-service by thread starvation

    CVE-2016-2161 httpd: DoS vulnerability in mod_auth_digest

    CVE-2016-8740 httpd: Incomplete handling of LimitRequestFields directive in mod_http2

    CVE-2016-8743 httpd: Apache HTTP Request Parsing Whitespace Defects

    CVE-2017-8779 rpcbind, libtirpc, libntirpc: Memory leak when failing to parse XDR strings or bytearrays

    CVE-2017-3139 bind: assertion failure in DNSSEC validation

    CVE-2017-7502 nss: Null pointer dereference when handling empty SSLv2 messages

    CVE-2017-1000367 sudo: Privilege escalation in via improper get_process_ttyname() parsing

    CVE-2016-8610 SSL/TLS: Malformed plain-text ALERT packets could cause remote DoS

    CVE-2017-5335 gnutls: Out of memory while parsing crafted OpenPGP certificate

    CVE-2017-5336 gnutls: Stack overflow in cdk_pk_get_keyid

    CVE-2017-5337 gnutls: Heap read overflow in read-packet.c

    CVE-2017-1000366 glibc: heap/stack gap jumping via unbounded stack allocations

    CVE-2017-1000368 sudo: Privilege escalation via improper get_process_ttyname() parsing

    CVE-2017-3142 bind: An error in TSIG authentication can permit unauthorized zone transfers

    CVE-2017-3143 bind: An error in TSIG authentication can permit unauthorized dynamic updates

    CVE-2017-10053 OpenJDK: reading of unprocessed image data in JPEGImageReader (2D, 8169209)

    CVE-2017-10067 OpenJDK: JAR verifier incorrect handling of missing digest (Security, 8169392)

    CVE-2017-10074 OpenJDK: integer overflows in range check loop predicates (Hotspot, 8173770)

    CVE-2017-10078 OpenJDK: Nashorn incompletely blocking access to Java APIs (Scripting, 8171539)

    CVE-2017-10081 OpenJDK: incorrect bracket processing in function signature handling (Hotspot, 8170966)

    CVE-2017-10087 OpenJDK: insufficient access control checks in ThreadPoolExecutor (Libraries, 8172204)

    CVE-2017-10089 OpenJDK: insufficient access control checks in ServiceRegistry (ImageIO, 8172461)

    CVE-2017-10090 OpenJDK: insufficient access control checks in AsynchronousChannelGroupImpl (8172465, Libraries)

    CVE-2017-10096 OpenJDK: insufficient access control checks in XML transformations (JAXP, 8172469)

    CVE-2017-10101 OpenJDK: unrestricted access to com.sun.org.apache.xml.internal.resolver (JAXP, 8173286)

    CVE-2017-10102 OpenJDK: incorrect handling of references in DGC (RMI, 8163958)

    CVE-2017-10107 OpenJDK: insufficient access control checks in ActivationID (RMI, 8173697)

    CVE-2017-10108 OpenJDK: unbounded memory allocation in BasicAttribute deserialization (Serialization, 8174105)

    CVE-2017-10109 OpenJDK: unbounded memory allocation in CodeSource deserialization (Serialization, 8174113)

    CVE-2017-10110 OpenJDK: insufficient access control checks in ImageWatched (AWT, 8174098)

    CVE-2017-10111 OpenJDK: incorrect range checks in LambdaFormEditor (Libraries, 8184185)

    CVE-2017-10115 OpenJDK: DSA implementation timing attack (JCE, 8175106)

    CVE-2017-10116 OpenJDK: LDAPCertStore following referrals to non-LDAP URLs (Security, 8176067)

    CVE-2017-10135 OpenJDK: PKCS#8 implementation timing attack (JCE, 8176760)

    CVE-2017-10193 OpenJDK: incorrect key size constraint check (Security, 8179101)

    CVE-2017-10198 OpenJDK: incorrect enforcement of certificate path restrictions (Security, 8179998)

    Sep
    14

    Acropolis Dynamic Scheduler (ADS) for AHV (Compute + Memory + Storage)

    The Acropolis Dynamic Scheduler (ADS) ensures that compute (CPU and RAM) and storage resources are available for VMs and volume groups (VGs) in the Nutanix cluster. ADS, enabled by default, uses real-time statistics to determine:

    Initial placement of VMs and VGs, specifically which AHV host runs a particular VM at power-on or a particular VG after creation.

    Required runtime optimizations, including moving particular VMs and VGs to other AHV hosts to give all workloads the best possible access to resources.

    If a problem is detected, a migration plan is created and executed thereby eliminating hotspots in the cluster by migrating VMs from one host to another. This feature only detects the contentions that are currently in progress. You can monitor these tasks from the Task dashboard of the Prism Web console. You can click the VM link to view the migration information, which includes the migration path (to the destination AHV host).

    The Acropolis block services feature uses the ADS feature for balancing sessions of the externally visible iSCSI targets.

    Sep
    14

    Nutanix AFS and SMB 3.0

    After you upgrade using the AFS (Acropolis File Services)2.2 bits you will have to manually change the max allowed protocol. This will be fixed with AFS 2.2.1 but here are the steps to get you going.

    The following commands listed below will set the max protocol to the proper version

    scli smbcli get --section global --param "server max protocol"
    [global]
    server max protocol = SMB2

    scli smbcli set --section global --param "server max protocol" --value SMB3_00
    smb.conf update is successful

    scli smbcli get --section global --param "server max protocol"
    [global]
    server max protocol = SMB3_00

    Also note if you want to run FSlogix and need an AFS share to have a block size of 512 bytes, this can be done. Default is 1024.