Dec
    13

    Enabling AHV Turbo on AOS 5.5

    Nutanix KB 4987

    From AOS 5.5, AHV Turbo replaces the QEMU SCSI data path in the AHV architecture for improved storage performance.

    For maximum performance, ensure the following on your Linux guest VMs:

    Enable the SCSI MQ feature by using the kernal command line:
    scsi_mod.use_blk_mq=y ( I put this in a /etc/udev/rules.d/)

    Kernels older than 3.17 do not support SCSI MQ.
    Kernels 4.14 or later have SCSI MQ enabled by default.
    For Windows VMs, AHV VirtIO drivers will support SCSI MQ in an upcoming release.

    AHV Turbo improves the storage data path performance even without the guest SCSI MQ support.

    Solution

    Perform the following to enable AHV Turbo on AOS 5.5.

    Upgrade to AOS 5.5.
    Upgrade to the AHV version bundled with AOS 5.5.
    Ensure your VMs have SCSI MQ enabled for maximum performance
    Power cycle your VMs to enable AHV Turbo.

    Note that you do not have to perform this procedure if you upgrading from AOS 5.5 to a later release. AHV Turbo will be enabled by default on your VMs in that case.

    Dec
    12

    Running IT: Docker and Cilium for Enterprise Network Security for Micro-Services

    Well I think 40 min is about as long as I can last watching a IT related video while running after that I need music! This time I watched another video from DockerCon, Cilium – Kernel Native Security & DDOS Mitigation for Microservices with BPF

    Skip to 7:23: The quick overview of the presentation is that managing IP Tables to lock down micro-services isn’t going to scale and will be almost impossible to manage. Cilium is open source software for providing and transparently securing network connectivity and load balancing between application workloads such as application containers or processes. Cilium operates at Layer 3/4 to provide traditional networking and security services as well as Layer 7 to protect and secure use of modern application protocols such as HTTP, gRPC and Kafka. BPF is used a lot of the big web-scale properties like Facebook and Netflix to secure their environment and to provide troubleshooting. Like anything with a lot of options there is a lot of ways to shoot yourself in the foot so Cilium provides the wrapper to get it easily deployed and configured.

    The presentation uses that example of locking down a Kafka cluster via layer 7 instead of having the whole API left wind open which would happen if your were only using IP tables. Kafka is used for building real-time pipelines and streaming apps. Kafka is horizontally scalable and fault-tolerant so it’s a good choice to run it in docker. Kakfa is used by 1/3 of Fortune 500 companies.

    Cilium Architecture

    Cilium Integrates with:

    Docker
    Kubernetes
    Mesos

    Cilium runs as a agent on every host.
    Cilium can provide policy for Host to Docker micro-service and even between two containers on the same host.

    The demo didn’t pan out but the 2nd half of the presentation talks about Cilium using BPF with XDP. XDP is a further step in evolution and enables to run a specific flavor of BPF programs from the network driver with direct access to the packet’s DMA buffer. This is, by definition, the earliest possible point in the software stack, where programs can be attached to in order to allow for a programmable, high performance packet processor in the Linux kernel networking data path.

    Since XDP can happen earlier on at the nic versus iptables with ipset, CPU can be saved, rules load faster and latency under load is a lot better with XDP.

    Dec
    05

    Handling Network Partition with Near-Sync

    Near-Sync is GA!!!

    Part 1: Near-Sync Primer on Nutanix
    Part 2: Recovery Points and Schedules with Near-Sync

    Perform the following procedure, if network partition (network isolation) between the primary and remote site occurs.

    Following scenarios may occur if the network partition occurs.

    1.Network between primary site (site A) and remote site (site B) is restored and both the sites are working.
    Primary site tries to transition into NearSync automatically between site A and site B. No manual intervention is required.

    2.Site B is not working or destroyed (for whatever reason). If you create a new site (site C) and want to establish sub-hourly schedule from A to C.
    Configure sub-hourly schedule from A to C.
    The configuration between A to C should succeed. No other manual intervention is required.

    3.Site A is not working or destroyed (for whatever reason). If you create a new site (site C) and try to configure sub-hourly schedule from B to C.
    Activate the protection domain on site B and set up the schedule between site B and site C.

    Dec
    01

    Supported Anti-Virus Offload for Nutanix Native File Services(AFS)


    As the list grows with releases I will try to keep this updated.

    As of AFS 2.2.1 supported AV ICAP based vendors:

    McAfee Virus Scan Enterprise for Storage 1.2.0

    Symantec Protection Engine 7.9.0

    Kaspersky Security 10

    Sophos Antivirus

    Nutanix recommends the following file extensions for user profiles are added to the exclusion list when using the AFS Antivirus scanning:
    .dat
    .ini
    .pol

    Symantec Pre-Req

    Each Symantec ICAP server needs the hot fix (SPE_7.9.0_HF03.zip) installed from http://www.symantec.com/docs/TECH216348.

    Kaspersky Pre-Req
    When running the Database Update task with the network folder as an update source, you might encounter an error after entering credentials.

    Solution

    To resolve, download and install the critical fix 13017 provided by Kaspersky

    Download Link:

    https://support.kaspersky.com/13017

    Nov
    29

    Nutanix Scale Out File Services – AFS 2.2 Supported Clients

    Supported Configurations

    The following are AFS supported configurations.

  • Windows Server 2008 R2
  • Windows Server 2012
  • Windows Server 2012
  • Windows Server 2012 R2
  • Windows Server 2016
  • Active Directory
    Domain Functional Level Supported Domain Controller
    • Windows Server 2008*
    • Windows Server 2008* and up
    • Windows Server 2008 R2
    • Windows Server 2008 R2
    • Windows Server 2012
    • Windows Server 2012 R2
    • Windows Server 2016
    * = AFS 2.0.2 and 2.1 support Windows 2008.
    Clients/Use Cases
    OS Type Supported Versions
    • Apple Client
    • OS X El Capitan (10.11)
    • macOS Sierra (10.12)
    • Windows Client
    • Windows 7
    • Windows 8
    • Windows 8.1
    • Windows 10
    • Windows Server
    • Windows Server 2008
    • Windows 2008 R2
    • Windows Server 2012
    • Windows Server 2012 R2
    • Windows Server 2016

    SMB Protocol Versions

    Server Message Block (SMB) serves as an application layer network that provides shared
    access to files and network node communication. AFS supports the following SMB versions.

    • SMB 2.0
    • SMB 2.1
    • SMB 3.0 (basic protocol support without specific SMB 3.0 features)

    Free CCNA Lab Guide

    Free CCNA Lab guide from community memeber, Neil Anderson. You can run all the labs completely for free on your laptop, no additional equipment is necessary. Full instructions and startup files are provided so you can immediately get into the hands on practice you need to master Cisco networking and pass the exam.

    Available at: https://www.flackbox.com/cisco-ccna-lab-guide

    350-page Complete Lab Exercises with Full Solutions

    The IOS Operating System
    The Life of a Packet
    The Cisco Troubleshooting Methodology
    Cisco Router and Switch Basics
    Cisco Device Management
    Routing Fundamentals
    Dynamic Routing Protocols
    Connectivity Troubleshooting
    RIP Routing Information Protocol
    EIGRP Enhanced Interior Gateway Routing Protocol
    OSPF Open Shortest Path First
    VLANs and Inter-VLAN Routing
    DHCP Dynamic Host Configuration Protocol
    HSRP Hot Standby Router Protocol
    STP Spanning Tree Protocol
    EtherChannel
    Port Security
    ACL Access Control Lists
    NAT Network Address Translation
    IPv6 Addressing
    IPv6 Routing
    WAN Wide Area Networks
    BGP Border Gateway Protocol
    Cisco Device Security
    Network Device Management

    Nov
    27

    Running IT: Using Docker to Scale Operational Intelligence at Splunk

    I am trying to get back into running and would like to complete a marathon at some point but I am taking it slow. Last time I tired, I got up to 36 KM but knees and feet didn’t fare so well. With that being said I am going to have some time on the treadmill and elliptical and one way I can be of service is to tell you the important parts of video’s I watch and hopefully give you back some time as well. The first video I picked was Using Docker to Scale Operational Intelligence at Splunk from Dockercon 17 Europe.

    I was kinda hoping for Splunk and Docker Integration but it was more about how Splunk was using Docker. Interestingly Splunk does have a good use case for needing both Windows and Linux nodes for testing. When I first heard that Docker could have both Linux and Windows hosts in the same Swarm cluster I thought that was cool but didn’t really know how much it would be used.

    Skip to 21:33
    – First half is mainly review of Docker EE and why Splunk picked Docker. I am sure most have heard similar stories. There is mention of RBAC for Nodes which allows secure multi-tenancy across multiple teams through node-based isolation and segregation. At the time of the video Splunk wasn’t using it but would have made life easier.

    Interesting Notes for the Session

    Started with a Bare-metal test lab of 150 nodes using UCP. Now running over 600 servers.

    Splunk 7 was a new feature, Metrics. Metrics is a for system administrators and IT tools engineers that focuses on collecting, investigating, monitoring, and sharing metrics from your technology infrastructure, security systems, and business applications in real time. Splunk is using Collectd to get data into Splunk and also grabs the logs and search it from the same interface.

    Splunk using 1 container per server for performance testing.

    Or test/dev testing Splunk uses 20-30 containers per server.

    Running a complicated system to make sure performance and test/dev containers don’t intermix. Splunk is hoping to use the new RBAC for nodes and the CPU/Memory reservations to clean up the CI workflow.

    Moving away for Jenkins for CI. Using UCP to move away from agents to run over 2,000 concurrent jobs.

    Nov
    19

    Nutanix Additional Cluster Health Tooling: Panacea

    There are over 450 health checks in the Cluster Health UI inside of Prism Element. To provide additional help a new script called “panacea” had been added. Panacea is bundled with NCC 3.5 and later to provide a user-friendly interface for very advanced troubleshooting. The Nutanix Support team can take these logs and correlate results so you don’t have to wait for the problem to reoccur again before fixing the issue.

    The ability to quickly track retransmissions with a very low granularity for a distrusted system is very important. I am hoping in the future this new tooling will play into Nutanix’s ability for degraded node detection. Panacea can be ran for a specific time interval during which logs will be analyzed, possible options are:
    –last_no_of_hours
    –last_no_of_days
    –start_time
    –end_time

    Login to any CVM within the cluster and the command can be ran from home/nutanix/ncc/panacea/

    The below output is from using the tool when digging for network information.

    Network outage can cause degraded performance. Cluster network outage
    detection is based on following schemes:
    1) Cassandra Paxos Request Timeout Exceptions/Message Drops
    2) CVM Degraded node scoring
    3) Ping latency

    In some cases, intermittent network issue might NOT be reflected in ping latency, but it does have impact on TCP throughput and packet
    retransmission, leading to more request timeout exceptions.

    TCP Retransmission:
    ——————-
    By default, Panacea tracks the TCP connections(destination port 7000) used by Cassandra between peer CVMs. This table displays stats of
    packet Retransmissions per min in TCP socket. Frequent retransmission could cause delay in application, and may reflect the congestion status on the host or in the network.
    1) Local: Local CVM IP address
    2) Remote: Remote CVM IP address
    3) Max/Mean/Min/STD: number of retransmissions/min, calcuated from
    samples where retransmission happened.
    4) %: Value distribution, % of samples is less than the value
    = 25, 50, and 75
    5) Ratio: N/M, N = number of samples where retransmission happened
    M = total samples in the entire data set

    +————–+————–+——-+——+——+——+——+——+——+———+
    | Local | Remote | Max | Mean | Min | STD | 25% | 50% | 75% | Ratio |
    +————–+————–+——-+——+——+——+——+——+——+———+
    | XX.X.XXX.110 | XX.X.XXX.109 | 19.00 | 1.61 | 1.00 | 1.90 | 1.00 | 1.00 | 2.00 | 133/279 |
    | XX.X.XXX.111 | XX.X.XXX.109 | 11.00 | 2.41 | 1.00 | 1.54 | 1.00 | 2.00 | 3.00 | 236/280 |
    | XX.X.XXX.112 | XX.X.XXX.109 | 12.00 | 2.40 | 1.00 | 1.59 | 1.00 | 2.00 | 3.00 | 235/279 |
    | XX.X.XXX.109 | XX.X.XXX.110 | 32.00 | 3.04 | 1.00 | 2.70 | 1.00 | 2.00 | 4.00 | 252/279 |
    | XX.X.XXX.111 | XX.X.XXX.110 | 9.00 | 1.51 | 1.00 | 1.02 | 1.00 | 1.00 | 2.00 | 152/280 |
    | XX.X.XXX.112 | XX.X.XXX.110 | 11.00 | 2.21 | 1.00 | 1.31 | 1.00 | 2.00 | 3.00 | 231/279 |
    | XX.X.XXX.109 | XX.X.XXX.111 | 9.00 | 2.01 | 1.00 | 1.20 | 1.00 | 2.00 | 2.00 | 202/279 |
    | XX.X.XXX.110 | XX.X.XXX.111 | 10.00 | 2.70 | 1.00 | 1.68 | 1.00 | 2.00 | 3.00 | 244/279 |
    | XX.X.XXX.112 | XX.X.XXX.111 | 4.00 | 1.46 | 1.00 | 0.76 | 1.00 | 1.00 | 2.00 | 135/279 |
    | XX.X.XXX.109 | XX.X.XXX.112 | 5.00 | 1.56 | 1.00 | 0.85 | 1.00 | 1.00 | 2.00 | 150/279 |
    | XX.X.XXX.110 | XX.X.XXX.112 | 6.00 | 2.05 | 1.00 | 1.18 | 1.00 | 2.00 | 3.00 | 234/279 |
    | XX.X.XXX.111 | XX.X.XXX.112 | 16.00 | 3.26 | 1.00 | 2.24 | 2.00 | 3.00 | 4.00 | 261/280 |
    +————–+————–+——-+——+——+——+——+——+——+———+

    Most of the 450 Cluster Health checks inside of Prism with automatic alerting

    CVM | CPU
    CPU Utilization

    Load Level

    Node Avg Load – Critical

    CVM | Disk
    Boot RAID Health

    Disk Configuration

    Disk Diagnostic Status

    Disk Metadata Usage

    Disk Offline Status

    HDD Disk Usage

    HDD I/O Latency

    HDD S.M.A.R.T Health Status

    Metadata Disk Mounted Check

    Metro Vstore Mount Status

    Non SED Disk Inserted Check

    Nutanix System Partitions Usage High

    Password Protected Disk Status

    Physical Disk Remove Check

    Physical Disk Status

    SED Operation Status

    SSD I/O Latency

    CVM | Hardware
    Agent VM Restoration

    FT2 Configuration

    Host Evacuation Status

    Node Status

    VM HA Healing Status

    VM HA Status

    VMs Restart Status

    CVM | Memory
    CVM Memory Pinned Check

    CVM Memory Usage

    Kernel Memory Usage

    CVM | Network
    CVM IP Address Configuration

    CVM NTP Time Synchronization

    Duplicate Remote Cluster ID Check

    Host IP Pingable

    IP Configuration

    SMTP Configuration

    Subnet Configuration

    Virtual IP Configuration

    vCenter Connection Check

    CVM | Protection Domain
    Entities Restored Check

    Restored Entities Protected

    CVM | Services
    Admin User API Authentication Check

    CVM Rebooted Check

    CVM Services Status

    Cassandra Waiting For Disk Replacement

    Certificate Creation Status

    Cluster In Override Mode

    Cluster In Read-Only Mode

    Curator Job Status

    Curator Scan Status

    Kerberos Clock Skew Status

    Metadata Drive AutoAdd Disabled Check

    Metadata Drive Detached Check

    Metadata Drive Failed Check

    Metadata Drive Ring Check

    Metadata DynRingChangeOp Slow Check

    Metadata DynRingChangeOp Status

    Metadata Imbalance Check

    Metadata Size

    Node Degradation Status

    RemoteSiteHighLatency

    Stargate Responsive

    Stargate Status

    Upgrade Bundle Available

    CVM | Storage Capacity
    Compression Status

    Finger Printing Status

    Metadata Usage

    NFS Metadata Size Overshoot

    On-Disk Dedup Status

    Space Reservation Status

    vDisk Block Map Usage

    vDisk Block Map Usage Warning

    Cluster | CPU
    CPU type on chassis check

    Cluster | Disk
    CVM startup dependency check

    Disk online check

    Duplicate disk id check

    Flash Mode Configuration

    Flash Mode Enabled VM Power Status

    Flash Mode Usage

    Incomplete disk removal

    Storage Pool Flash Mode Configuration

    System Defined Flash Mode Usage Limit

    Cluster | Hardware
    Power Supply Status

    Cluster | Network
    CVM Passwordless Connectivity Check

    CVM to CVM Connectivity

    Duplicate CVM IP check

    NIC driver and firmware version check

    Time Drift

    Cluster | Protection Domain
    Duplicate VM names

    Internal Consistency Groups Check

    Linked Clones in high frequency snapshot schedule

    SSD Snapshot reserve space check

    Snapshot file location check

    Cluster | Remote Site
    Cloud Remote Alert

    Remote Site virtual external IP(VIP)

    Cluster | Services
    AWS Instance Check

    AWS Instance Type Check

    Acropolis Dynamic Scheduler Status

    Alert Manager Service Check

    Automatic Dedup disabled check

    Automatic disabling of Deduplication

    Backup snapshots on metro secondary check

    CPS Deployment Evaluation Mode

    CVM same timezone check

    CVM virtual hardware version check

    Cassandra Similar Token check

    Cassandra metadata balanced across CVMs

    Cassandra nodes up

    Cassandra service status check

    Cassandra tokens consistent

    Check that cluster virtual IP address is part of cluster external subnet

    Checkpoint snapshot on Metro configured Protection Domain

    Cloud Gflags Check

    Cloud Remote Version Check

    Cloud remote check

    Cluster NCC version check

    Cluster version check

    Compression disabled check

    Curator scan time elapsed check

    Datastore VM Count Check

    E-mail alerts check

    E-mail alerts contacts configuration

    HTTP proxy check

    Hardware configuration validation

    High disk space usage

    Hypervisor version check

    LDAP configuration

    Linked clones on Dedup check

    Multiple vCenter Servers Discovered

    NGT CA Setup Check

    Oplog episodes check

    Pulse configuration

    RPO script validation on storage heavy cluster

    Remote Support Status

    Report Generation Failure

    Report Quota Scan Failure

    Send Report Through E-mail Failure

    Snapshot chain height check

    Snapshots space utilization status

    Storage Pool SSD tier usage

    Stretch Connectivity Lost

    VM group Snapshot and Current Mismatch

    Zookeeper active on all CVMs

    Zookeeper fault tolerance check

    Zookeeper nodes distributed in multi-block cluster

    vDisk Count Check

    Cluster | Storage Capacity
    Erasure Code Configuration

    Erasure Code Garbage

    Erasure coding pending check

    Erasure-Code-Delay Configuration

    High Space Usage on Storage Container

    Storage Container RF Status

    Storage Container Space Usage

    StoragePool Space Usage

    Volume Group Space Usage

    Data Protection | Protection Domain
    Aged Third-party Backup Snapshot Check

    Check VHDX Disks

    Clone Age Check

    Clone Count Check

    Consistency Group Configuration

    Cross Hypervisor NGT Installation Check

    EntityRestoreAbort

    External iSCSI Attachments Not Snapshotted

    Failed To Mount NGT ISO On Recovery of VM

    Failed To Recover NGT Information

    Failed To Recover NGT Information for VM

    Failed To Snapshot Entities

    Incorrect Cluster Information in Remote Site

    Metadata Volume Snapshot Persistent

    Metadata Volume Snapshot Status

    Metro Availability

    Metro Availability Prechecks Failed

    Metro Availability Secondary PD sync check

    Metro Old Primary Site Hosting VMs

    Metro Protection domain VMs running at Sub-optimal performance

    Metro Vstore Symlinks Check

    Metro/Vstore Consistency Group File Count Check

    Metro/Vstore Protection Domain File Count Check

    NGT Configuration

    PD Active

    PD Change Mode Status

    PD Full Replication Status

    PD Replication Expiry Status

    PD Replication Skipped Status

    PD Snapshot Retrieval

    PD Snapshot Status

    PD VM Action Status

    PD VM Registration Status

    Protected VM CBR Capablity

    Protected VM Not Found

    Protected VMs Not Found

    Protected VMs Storage Configuration

    Protected Volume Group Not Found

    Protected Volume Groups Not Found

    Protection Domain Decoupled Status

    Protection Domain Initial Replication Pending to Remote Site

    Protection Domain Replication Stuck

    Protection Domain Snapshots Delayed

    Protection Domain Snapshots Queued for Replication to Remote Site

    Protection Domain VM Count Check

    Protection Domain fallback to lower frequency replications to remote

    Protection Domain transitioning to higher frequency snapshot schedule

    Protection Domain transitioning to lower frequency snapshot schedule

    Protection Domains sharing VMs

    Related Entity Protection Status

    Remote Site NGT Support

    Remote Site Snapshot Replication Status

    Remote Stargate Version Check

    Replication Of Deduped Entity

    Self service restore operation Failed

    Snapshot Crash Consistent

    Snapshot Symlink Check

    Storage Container Mount

    Updating Metro Failure Handling Failed

    Updating Metro Failure Handling Remote Failed

    VM Registration Failure

    VM Registration Warning

    VSS Scripts Not Installed

    VSS Snapshot Status

    VSS VM Reachable

    VStore Snapshot Status

    Volume Group Action Status

    Volume Group Attachments Not Restored

    Vstore Replication To Backup Only Remote

    Data Protection | Remote Site
    Automatic Promote Metro Availability

    Cloud Remote Operation Failure

    Cloud Remote Site failed to start

    LWS store allocation in remote too long

    Manual Break Metro Availability

    Manual Promote Metro Availability

    Metro Connectivity

    Remote Site Health

    Remote Site Network Configuration

    Remote Site Network Mapping Configuration

    Remote Site Operation Mode ReadOnly

    Remote Site Tunnel Status

    Data Protection | Witness
    Authentication Failed in Witness

    Witness Not Configured

    Witness Not Reachable

    File server | Host
    File Server Upgrade Task Stuck Check

    File Server VM Status

    Multiple File Server Versions Check

    File server | Network
    File Server Entities Not Protected

    File Server Invalid Snapshot Warning

    File Server Network Reachable

    File Server PD Active On Multiple Sites

    File Server Reachable

    File Server Status

    Remote Site Not File Server Capable

    File server | Services
    Failed to add one or more file server admin users or groups

    File Server AntiVirus – All ICAP Servers Down

    File Server AntiVirus – Excessive Quarantined / Unquarantined Files

    File Server AntiVirus – ICAP Server Down

    File Server AntiVirus – Quarantined / Unquarantined Files Limit Reached

    File Server AntiVirus – Scan Queue Full on FSVM

    File Server AntiVirus – Scan Queue Piling Up on FSVM

    File Server Clone – Snapshot invalid

    File Server Clone Failed

    File Server Rename Failed

    Maximum connections limit reached on a file server VM

    Skipped File Server Compatibility Check

    File server | Storage Capacity
    FSVM Time Drift Status

    Failed To Run File Server Metadata Fixer Successfully

    Failed To Set VM-to-VM Anti Affinity Rule

    File Server AD Connectivity Failure

    File Server Activation Failed

    File Server CVM IP update failed

    File Server DNS Updates Pending

    File Server Home Share Creation Failed

    File Server In Heterogeneous State

    File Server Iscsi Discovery Failure

    File Server Join Domain Status

    File Server Network Change Failed

    File Server Node Join Domain Status

    File Server Performance Optimization Recommended

    File Server Quota allocation failed for user

    File Server Scale-out Status

    File Server Share Deletion Failed

    File Server Site Not Found

    File Server Space Usage

    File Server Space Usage Critical

    File Server Storage Cleanup Failure

    File Server Storage Status

    File Server Unavailable Check

    File Server Upgrade Failed

    Incompatible File Server Activation

    Share Utilization Reached Configured Limit

    Host | CPU
    CPU Utilization

    Host | Disk
    All-flash Node Intermixed Check

    Host disk usage high

    NVMe Status Check

    SATA DOM 3ME Date and Firmware Status

    SATA DOM Guest VM Check

    SATADOM Connection Status

    SATADOM Status

    SATADOM Wearout Status

    SATADOM-SL 3IE3 Wearout Status

    Samsung PM1633 FW Version

    Samsung PM1633 Version Compatibility

    Samsung PM1633 Wearout Status

    Samsung PM863a config check

    Toshiba PM3 Status

    Toshiba PM4 Config

    Toshiba PM4 FW Version

    Toshiba PM4 Status

    Toshiba PM4 Version Compatibility

    Host | Hardware
    CPU Temperature Fetch

    CPU Temperature High

    CPU Voltage

    CPU-VRM Temperature

    Correctable ECC Errors 10 Days

    Correctable ECC Errors One Day

    DIMM Voltage

    DIMM temperature high

    DIMM-VRM Temperature

    Fan Speed High

    Fan Speed Low

    GPU Status

    GPU Temperature High

    Hardware Clock Status

    IPMI SDR Status

    SAS Connectivity

    System temperature high

    Host | Memory
    Memory Swap Rate

    Ram Fault Status

    Host | Network
    10 GbE Compliance

    Hypervisor IP Address Configuration

    IPMI IP Address Configuration

    Mellanox NIC Mixed Family check

    Mellanox NIC Status check

    NIC Flapping Check

    NIC Link Down

    Node NIC Error Rate High

    Receive Packet Loss

    Transmit Packet Loss

    Host | Services
    Datastore Remount Status

    Node | Disk
    Boot device connection check

    Boot device status check

    Descriptors to deleted files check

    FusionIO PCIE-SSD: ECC errors check

    Intel Drive: ECC errors

    Intel SSD Configuration

    LSI Disk controller firmware status

    M.2 Boot Disk change check

    M.2 Intel S3520 host boot drive status check

    M.2 Micron5100 host boot drive status check

    SATA controller

    SSD Firmware Check

    Samsung PM863a FW version check

    Samsung PM863a status check

    Samsung PM863a version compatibility check

    Samsung SM863 SSD status check

    Samsung SM863a version compatibility check

    Node | Hardware
    IPMI connectivity check

    IPMI sel assertions check

    IPMI sel log fetch check

    IPMI sel power failure check

    IPMI sensor values check

    M10 GPU check

    M10 and M60 GPU Mixed check

    M60 GPU check

    Node | Network
    CVM 10 GB uplink check

    Inter-CVM connectivity check

    NTP configuration check

    Storage routed to alternate CVM check

    Node | Protection Domain
    ESX VM Virtual Hardware Version Compatible

    Node | Services
    .dvsData directory in local datastore

    Advanced Encryption Standard (AES) enabled

    Autobackup check

    BMC BIOS version check

    CVM memory check

    CVM port group renamed

    Cassandra Keyspace/Column family check

    Cassandra memory usage

    Cassandra service restarts check

    Cluster Services Down Check

    DIMM Config Check

    DIMMs Interoperability Check

    Deduplication efficiency check

    Degraded Node check

    Detected VMs with non local data

    EOF check

    ESXi AHCI Driver version check

    ESXi APD handling check

    ESXi CPU model and UVM EVC mode check

    ESXi Driver compatibility check

    ESXi NFS hearbeat timeout check

    ESXi RAM disk full check

    ESXi RAM disk root usage

    ESXi Scratch Configuration

    ESXi TCP delayed ACK check

    ESXi VAAI plugin enabled

    ESXi VAAI plugin installed

    ESXi configured VMK check

    ESXi services check

    ESXi version compatibility

    File permissions check

    Files in a streched VMs should be in the same Storage Container

    GPU drivers installed

    Garbage egroups check

    Host passwordless SSH

    Ivy Bridge performance check

    Mellanox NIC Driver version check

    NFS file count check

    NSC(Nutanix Service Center) server FQDN resolution

    NTP server FQDN resolution

    Network adapter setting check

    Non default gflags check

    Notifications dropped check

    PYNFS dependency check

    RC local script exit statement present

    Remote syslog server check

    SMTP server FQDN resolution

    Sanity check on local.sh

    VM IDE bus check

    VMKNICs subnets check

    VMware hostd service check

    Virtual IP check

    Zookeeper Alias Check

    localcli check

    vim command check

    Nutanix Guest Tools | VM
    PostThaw Script Execution Failed

    Other Checks
    LWS Store Full

    LWS store allocation too long

    Recovery Point Objective Cannot Be Met

    VM | CPU
    CPU Utilization

    VM | Disk
    I/O Latency

    Orphan VM Snapshot Check

    VM | Memory
    Memory Pressure

    Memory Swap Rate

    VM | Network
    Memory Usage

    Receive Packet Loss

    Transmit Packet Loss

    VM | Nutanix Guest Tools
    Disk Configuration Update Failed

    VM Guest Power Op Failed

    iSCSI Configuration Failed

    VM | Remote Site
    VM Virtual Hardware Version Compatible

    VM | Services
    VM Action Status

    VM | Virtual Machine
    Application Consistent Snapshot Skipped

    NGT Mount Failure

    NGT Version Incompatible

    Temporary Hypervisor Snapshot Cleanup Failed

    VSS Snapshot Aborted

    VSS Snapshot Not Supported

    host | Network
    Hypervisor time synchronized

    Nov
    13

    NetBackup Got Upgraded: Modern Workloads use Parallel Streaming.

    A new capability in NetBackup 8.1 is its ability to protect modern web-scale and big data workloads like Hadoop and NoSQL that generate massive amounts of data. With the new Veritas Parallel Streaming technology, these modern scale-out workloads can be backed up and protected with extreme efficiency by leveraging the power of multiple nodes simultaneously. The result is that organizations can now adopt these modern workloads with confidence, knowing that their data, even in massive volumes, will be protected. And since new workloads can be added via a plug-in rather than a software agent, organizations can add new workloads without having to wait for a next NetBackup software release. NetBackup Parallel Streaming also supports workloads running on hyper-converged infrastructure (HCI) from Nutanix, as Nutanix and Veritas have partnered to certify protection of those workloads on HCI.

    source


    Netbackup takes care of utilizing all of the Nutanix storage controllers. NetBackup will mount using NFS of the local storage controller for each node removing the need for a proxy host.

    For some general best practices with the first release of this plugin:

      * Apply limit to backup maximum n*4 VMs concurrently, where n is number of nodes in the Nutanix cluster. 16 node cluster than would have 64 VMs being backed up concurrently.

      * Use Media Server as Backup Host if possible.

    Note: If VMs are powered off, then Prim element VIP will be used.

    Oct
    27

    Mounting and Enabling NGT Results in an Error Message….. CD/DVD-ROM Drive

    Nutanix has a KB ARTICLE 3455

    Mounting and Enabling NGT Results in an Error Message that Indicates that the VM does not have a CD/DVD-ROM Drive

    If you enable Nutanix Guest Tools (NGT) in the Prism web console, the following error message is displayed.

    This VM does not have a CD/DVD-ROM device.
    OR
    Guest Tools cannot be mounted as there not enough empty CD-ROM(s)

    This error message is displayed even though the VM has the CD/DVD device.

    You can go ahead and read the KB but its caused by newer VMware versions using a SATA controller instead IDE for the CD ROM. On my VM it kept switching back to SATA from IDE. I got around it by adding a 2nd CD-ROM that was IDE.