Jan
19

The Wish List for Nutanix Centric Enterprise backups?

I was asked today what I would look for for in a solution that was Nutanix focused for backup. Just quickly spit balling I came up with the following list:

Backup vendor must use ABS to eliminate the needs for multiple proxies or solve this using a similar method.

Backup vendor must support or show a road map of supporting AFS change file tracking? Minimum of supporting NAS backup

Backup vendor must support backing up volume groups.

Backup vendor must support backing up Nutanix snapshots that have been replicated from a remote site to a primary DC.

Backup vendor must support Change Region Tracking for AHV. ESXi is a plus.

Backup vendor must support synthetic full backups.

Backup vendor must have native support for X APP, list your App like SQL.

Backup vendor must have an open API.

Got any others to add?

Jan
15

Prism Central with Self Service Portal – Cheat Notes

The Prism Self Service feature represents a special view within Prism Central. While Prism Central enables infrastructure management across clusters, Prism Self Service allows end-users to consume that infrastructure in a self-service manner. Prism Self Service uses the resources provided by a single AHV cluster. (ESXi and Hyper-V are not supported platforms for Prism Self Service.)

    Nutanix recommends using the Chrome or Firefox browsers to deploy or install Prism Central (PC). Nutanix support has a KB if IE is the only allowed browser.
    In Prism Central 5.5, users that are part of nested groups cannot log on to the Prism Central web console.
    Always upgrade PC before Prism Element(your clusters)
    Want longer retention, go with a bigger PC instance due to the larger disk size.
    Prism Central and its managed clusters are not supported in environments deploying Network Address Translation (NAT).
    Best Practice to keep NNC on all managed cluster the same
    As of Prism Central 5.5, only User Principal Name (UPN) credentials are accepted for logon. the admin user must log on and specify a service account for the directory service in the Authentication Configuration dialog box before authentication for other users can start working.
    Name servers are computers that host a network service for providing responses to queries against a directory service, such as a DNS server. Changes in name server configuration may take up to 5 minutes to take effect. Functions that rely on DNS may not work properly during this time. If Prism Central is running on Hyper-V, you must specific the IP address of the Active Directory Domain Controller server, not the hostname. Do not use DNS hostnames or external NTP servers.
    Three primary roles when configuring Prism Self Service

      Prism Central administrator
      Self-service administrator
      Project user

    Prism Central administrator. The Prism Central administrator enables Prism Self Service and creates one or more self-service administrators. Prism Central administrators also create VMs, images, and network configurations that may be consumed by self-service users.

    Self-service administrator. The self-service administrator performs the following tasks:
    Creates a project for each team that needs self-service and adds Active Directory users and groups to the projects.
    Configures roles for project members.
    Publishes VM templates and images to the catalog.
    Monitors resource usage by various projects and its VMs and members, and then adjusts resource quotas as necessary.
    A Prism Central administrator can also perform any of theses tasks, but they are normally delegated to a self-service administrator.
    Self-service administrators have full access to all VMs running on the Nutanix cluster, including infrastructure VMs not tied to a project. Self-service administrators can assign infrastructure VMs to project members, add them to the catalog, and delete them even if they do not have administrative access to Prism Central.


Setting Up AD with SSP

    Users with the “User must change password at next logon” attribute enabled will not be able to authenticate to Prism Central. Ensure users with this attribute first login to a domain workstation and change their password prior to accessing Prism Central. Also, if SSL is enabled on the Active Directory server, make sure that Nutanix has access to that port (open in firewall).
    Port 389 (LDAP). Use this port number (in the following URL form) when the configuration is single domain, single forest, and not using SSL.
    ldap://ad_server.mycompany.com:389
    Port 636 (LDAPS). Use this port number (in the following URL form) when the configuration is single domain, single forest, and using SSL. This requires all Active Directory Domain Controllers have properly installed SSL certificates.
    ldaps://ad_server.mycompany.com:636
    Port 3268 (LDAP – GC). Use this port number when the configuration is multiple domain, single forest, and not using SSL.
    Port 3269 (LDAPS – GC). Use this port number when the configuration is multiple domain, single forest, and using SSL.
    With in a project:Allow collaboration: Check the box to allow any group member to see the VMs, applications, and other objects created by other members of the group. If this box is not checked, group members can see only the objects they create. The role assigned a group member determines the permissions that user has on objects created by other group members.
    Role Mapping – Prism matches AD group name using case sensitive checks, so if the group name defined under the role mapping in Prism has difference in the upper/lower characters than how it is defined in the AD, Prism will fail to perform the name mapping for the group.

    Ensure also that the customer is adding the “@domain_name” to the username when he is logging to PRISM central.

Dec
20

Docker Swarm with Nutanix Calm

Review -> What is Nutanix CALM?

Nutanix Calm provides a set of pre-seeded application blueprints that are available to you for consumption.

Docker Swarm is a clustering and scheduling tool for Docker containers. Lots of hype with Kubernetes right now and rightly so but Swarm is a great tool and still getting better. One of the blueprints available with Calm is Docker Swarm. With Swarm, IT administrators and developers can establish and manage a cluster of Docker nodes as a single virtual system. Swarm mode also exists natively for Docker Engine, the layer between the OS and container images. Swarm mode integrates the orchestration capabilities of Docker Swarm into Docker Engine. For AHV, by default blueprint creates 3 Master VMs with 2 Core, 4GB RAM, Root disk – 10GB, and Data Disk 3x10GB. For AWS, by default blueprint create 3 Slave VMs t2.medium, and Data Disk 3x10GB.


Installed Version- Docker - 17.09.0.ce

Variables

DOCKER_VERSION - (Mandatory) Docker version default.
INSTANCE_PUBLIC_KEY - (Mandatory) Instance public key (only for AHV).
Click Marketplace tab.
Click the Docker Swarm blueprint application.
Click Launch.
The blueprint application launch page is displayed.

Enter a name for the application in the Name of the Application field. For the application blueprint naming conventions, see Launching an Application Blueprint.
Select the Application profile.
If the application profile is Nutanix then do the following.
(Optional) Change the VM name.
(Optional) Change the number of vCPUs and RAM.
Select the NIC from the drop-down menu.
Download the CentOS 7 from the repository.
Enter the private key.
If the application profile is AWS then do the following.
(Optional) Change the VM name.
Select the instance type.
Select a CentOS 7 image as per the region and AZ.
Select the VPC and subnet.
Ensure that the security groups have access of ICMP port so that master and slave nodes can ping each other.

Select the SSH keys.
Repeat the above steps for docker slave services.

    Dec
    15

    Nutanix Calm Blueprints Overview

    Nutanix Calm Overview

    A blueprint is the framework for every application that you model by using Nutanix Calm. Blueprints are templates that describe all the steps that are required to provision, configure, and execute tasks on the services and applications that are created. You can create a blueprint to represent the architecture of your application and then run the blueprint repeatedly to create an instance, provision, and launch your applications. A blueprint also defines the lifecycle of an application and its underlying infrastructure starting from the creation of the application to the actions that are carried out on a blueprint until the termination of the application.

    You can use blueprints to model the applications of various complexities; from simply provisioning a single virtual machine to provisioning and managing a multi-node, multi-tier application.

    Blueprint editor provides a graphical representation of various components that enable you to visualize and configure the components and their dependencies in your environment.

    repeatable and auditable automation

    Dec
    14

    What is Nutanix CALM?

    Nutanix Calm allows you to seamlessly select, provision, and manage your business applications across your infrastructure for both the private and public clouds. Nutanix Calm provides App lifecycle, monitoring and remediation to manage your heterogeneous infrastructure, for example, VMs or bare-metal servers. Nutanix Calm supports multiple platforms so that you can use the single self-service and automation interface to manage all your infrastructure. Nutanix Calm provides an interactive and user friendly Graphical User Interface (GUI) to manage your infrastructure.

    Features of Nutanix Calm

    Application Lifecycle Management: Automates the provision and deletion of both traditional multi-tiered applications and modern distributed services by using pre-integrated blueprints that make management of applications simple in both private (AHV) and public cloud (AWS).

    Customizable Blueprints: Simplifies the setup and management of custom enterprise applications by incorporating the elements of each app, including relevant VMs, configurations and related binaries into an easy-to-use blueprint that can be managed by the infrastructure team. More Info on Blueprints.

    Nutanix Marketplace:
    Publishes the application blueprints directly to the end users through Marketplace.

    Governance: Maintains control with role-based governance thereby limiting the user operations that are based on the permissions.

    Hybrid Cloud Management
    : Automates the provisioning of a hybrid cloud architecture, scaling both multi-tiered and distributed applications across cloud environments, including AWS.

    Dec
    13

    Enabling AHV Turbo on AOS 5.5

    Nutanix KB 4987

    From AOS 5.5, AHV Turbo replaces the QEMU SCSI data path in the AHV architecture for improved storage performance.

    For maximum performance, ensure the following on your Linux guest VMs:

    Enable the SCSI MQ feature by using the kernal command line:
    scsi_mod.use_blk_mq=y ( I put this in a /etc/udev/rules.d/)

    Kernels older than 3.17 do not support SCSI MQ.
    Kernels 4.14 or later have SCSI MQ enabled by default.
    For Windows VMs, AHV VirtIO drivers will support SCSI MQ in an upcoming release.

    AHV Turbo improves the storage data path performance even without the guest SCSI MQ support.

    Solution

    Perform the following to enable AHV Turbo on AOS 5.5.

    Upgrade to AOS 5.5.
    Upgrade to the AHV version bundled with AOS 5.5.
    Ensure your VMs have SCSI MQ enabled for maximum performance
    Power cycle your VMs to enable AHV Turbo.

    Note that you do not have to perform this procedure if you upgrading from AOS 5.5 to a later release. AHV Turbo will be enabled by default on your VMs in that case.

    Dec
    05

    Handling Network Partition with Near-Sync

    Near-Sync is GA!!!

    Part 1: Near-Sync Primer on Nutanix
    Part 2: Recovery Points and Schedules with Near-Sync

    Perform the following procedure, if network partition (network isolation) between the primary and remote site occurs.

    Following scenarios may occur if the network partition occurs.

    1.Network between primary site (site A) and remote site (site B) is restored and both the sites are working.
    Primary site tries to transition into NearSync automatically between site A and site B. No manual intervention is required.

    2.Site B is not working or destroyed (for whatever reason). If you create a new site (site C) and want to establish sub-hourly schedule from A to C.
    Configure sub-hourly schedule from A to C.
    The configuration between A to C should succeed. No other manual intervention is required.

    3.Site A is not working or destroyed (for whatever reason). If you create a new site (site C) and try to configure sub-hourly schedule from B to C.
    Activate the protection domain on site B and set up the schedule between site B and site C.

    Nov
    19

    Nutanix Additional Cluster Health Tooling: Panacea

    There are over 450 health checks in the Cluster Health UI inside of Prism Element. To provide additional help a new script called “panacea” had been added. Panacea is bundled with NCC 3.5 and later to provide a user-friendly interface for very advanced troubleshooting. The Nutanix Support team can take these logs and correlate results so you don’t have to wait for the problem to reoccur again before fixing the issue.

    The ability to quickly track retransmissions with a very low granularity for a distrusted system is very important. I am hoping in the future this new tooling will play into Nutanix’s ability for degraded node detection. Panacea can be ran for a specific time interval during which logs will be analyzed, possible options are:
    –last_no_of_hours
    –last_no_of_days
    –start_time
    –end_time

    Login to any CVM within the cluster and the command can be ran from home/nutanix/ncc/panacea/

    The below output is from using the tool when digging for network information.

    Network outage can cause degraded performance. Cluster network outage
    detection is based on following schemes:
    1) Cassandra Paxos Request Timeout Exceptions/Message Drops
    2) CVM Degraded node scoring
    3) Ping latency

    In some cases, intermittent network issue might NOT be reflected in ping latency, but it does have impact on TCP throughput and packet
    retransmission, leading to more request timeout exceptions.

    TCP Retransmission:
    ——————-
    By default, Panacea tracks the TCP connections(destination port 7000) used by Cassandra between peer CVMs. This table displays stats of
    packet Retransmissions per min in TCP socket. Frequent retransmission could cause delay in application, and may reflect the congestion status on the host or in the network.
    1) Local: Local CVM IP address
    2) Remote: Remote CVM IP address
    3) Max/Mean/Min/STD: number of retransmissions/min, calcuated from
    samples where retransmission happened.
    4) %: Value distribution, % of samples is less than the value
    = 25, 50, and 75
    5) Ratio: N/M, N = number of samples where retransmission happened
    M = total samples in the entire data set

    +————–+————–+——-+——+——+——+——+——+——+———+
    | Local | Remote | Max | Mean | Min | STD | 25% | 50% | 75% | Ratio |
    +————–+————–+——-+——+——+——+——+——+——+———+
    | XX.X.XXX.110 | XX.X.XXX.109 | 19.00 | 1.61 | 1.00 | 1.90 | 1.00 | 1.00 | 2.00 | 133/279 |
    | XX.X.XXX.111 | XX.X.XXX.109 | 11.00 | 2.41 | 1.00 | 1.54 | 1.00 | 2.00 | 3.00 | 236/280 |
    | XX.X.XXX.112 | XX.X.XXX.109 | 12.00 | 2.40 | 1.00 | 1.59 | 1.00 | 2.00 | 3.00 | 235/279 |
    | XX.X.XXX.109 | XX.X.XXX.110 | 32.00 | 3.04 | 1.00 | 2.70 | 1.00 | 2.00 | 4.00 | 252/279 |
    | XX.X.XXX.111 | XX.X.XXX.110 | 9.00 | 1.51 | 1.00 | 1.02 | 1.00 | 1.00 | 2.00 | 152/280 |
    | XX.X.XXX.112 | XX.X.XXX.110 | 11.00 | 2.21 | 1.00 | 1.31 | 1.00 | 2.00 | 3.00 | 231/279 |
    | XX.X.XXX.109 | XX.X.XXX.111 | 9.00 | 2.01 | 1.00 | 1.20 | 1.00 | 2.00 | 2.00 | 202/279 |
    | XX.X.XXX.110 | XX.X.XXX.111 | 10.00 | 2.70 | 1.00 | 1.68 | 1.00 | 2.00 | 3.00 | 244/279 |
    | XX.X.XXX.112 | XX.X.XXX.111 | 4.00 | 1.46 | 1.00 | 0.76 | 1.00 | 1.00 | 2.00 | 135/279 |
    | XX.X.XXX.109 | XX.X.XXX.112 | 5.00 | 1.56 | 1.00 | 0.85 | 1.00 | 1.00 | 2.00 | 150/279 |
    | XX.X.XXX.110 | XX.X.XXX.112 | 6.00 | 2.05 | 1.00 | 1.18 | 1.00 | 2.00 | 3.00 | 234/279 |
    | XX.X.XXX.111 | XX.X.XXX.112 | 16.00 | 3.26 | 1.00 | 2.24 | 2.00 | 3.00 | 4.00 | 261/280 |
    +————–+————–+——-+——+——+——+——+——+——+———+

    Most of the 450 Cluster Health checks inside of Prism with automatic alerting

    CVM | CPU
    CPU Utilization

    Load Level

    Node Avg Load – Critical

    CVM | Disk
    Boot RAID Health

    Disk Configuration

    Disk Diagnostic Status

    Disk Metadata Usage

    Disk Offline Status

    HDD Disk Usage

    HDD I/O Latency

    HDD S.M.A.R.T Health Status

    Metadata Disk Mounted Check

    Metro Vstore Mount Status

    Non SED Disk Inserted Check

    Nutanix System Partitions Usage High

    Password Protected Disk Status

    Physical Disk Remove Check

    Physical Disk Status

    SED Operation Status

    SSD I/O Latency

    CVM | Hardware
    Agent VM Restoration

    FT2 Configuration

    Host Evacuation Status

    Node Status

    VM HA Healing Status

    VM HA Status

    VMs Restart Status

    CVM | Memory
    CVM Memory Pinned Check

    CVM Memory Usage

    Kernel Memory Usage

    CVM | Network
    CVM IP Address Configuration

    CVM NTP Time Synchronization

    Duplicate Remote Cluster ID Check

    Host IP Pingable

    IP Configuration

    SMTP Configuration

    Subnet Configuration

    Virtual IP Configuration

    vCenter Connection Check

    CVM | Protection Domain
    Entities Restored Check

    Restored Entities Protected

    CVM | Services
    Admin User API Authentication Check

    CVM Rebooted Check

    CVM Services Status

    Cassandra Waiting For Disk Replacement

    Certificate Creation Status

    Cluster In Override Mode

    Cluster In Read-Only Mode

    Curator Job Status

    Curator Scan Status

    Kerberos Clock Skew Status

    Metadata Drive AutoAdd Disabled Check

    Metadata Drive Detached Check

    Metadata Drive Failed Check

    Metadata Drive Ring Check

    Metadata DynRingChangeOp Slow Check

    Metadata DynRingChangeOp Status

    Metadata Imbalance Check

    Metadata Size

    Node Degradation Status

    RemoteSiteHighLatency

    Stargate Responsive

    Stargate Status

    Upgrade Bundle Available

    CVM | Storage Capacity
    Compression Status

    Finger Printing Status

    Metadata Usage

    NFS Metadata Size Overshoot

    On-Disk Dedup Status

    Space Reservation Status

    vDisk Block Map Usage

    vDisk Block Map Usage Warning

    Cluster | CPU
    CPU type on chassis check

    Cluster | Disk
    CVM startup dependency check

    Disk online check

    Duplicate disk id check

    Flash Mode Configuration

    Flash Mode Enabled VM Power Status

    Flash Mode Usage

    Incomplete disk removal

    Storage Pool Flash Mode Configuration

    System Defined Flash Mode Usage Limit

    Cluster | Hardware
    Power Supply Status

    Cluster | Network
    CVM Passwordless Connectivity Check

    CVM to CVM Connectivity

    Duplicate CVM IP check

    NIC driver and firmware version check

    Time Drift

    Cluster | Protection Domain
    Duplicate VM names

    Internal Consistency Groups Check

    Linked Clones in high frequency snapshot schedule

    SSD Snapshot reserve space check

    Snapshot file location check

    Cluster | Remote Site
    Cloud Remote Alert

    Remote Site virtual external IP(VIP)

    Cluster | Services
    AWS Instance Check

    AWS Instance Type Check

    Acropolis Dynamic Scheduler Status

    Alert Manager Service Check

    Automatic Dedup disabled check

    Automatic disabling of Deduplication

    Backup snapshots on metro secondary check

    CPS Deployment Evaluation Mode

    CVM same timezone check

    CVM virtual hardware version check

    Cassandra Similar Token check

    Cassandra metadata balanced across CVMs

    Cassandra nodes up

    Cassandra service status check

    Cassandra tokens consistent

    Check that cluster virtual IP address is part of cluster external subnet

    Checkpoint snapshot on Metro configured Protection Domain

    Cloud Gflags Check

    Cloud Remote Version Check

    Cloud remote check

    Cluster NCC version check

    Cluster version check

    Compression disabled check

    Curator scan time elapsed check

    Datastore VM Count Check

    E-mail alerts check

    E-mail alerts contacts configuration

    HTTP proxy check

    Hardware configuration validation

    High disk space usage

    Hypervisor version check

    LDAP configuration

    Linked clones on Dedup check

    Multiple vCenter Servers Discovered

    NGT CA Setup Check

    Oplog episodes check

    Pulse configuration

    RPO script validation on storage heavy cluster

    Remote Support Status

    Report Generation Failure

    Report Quota Scan Failure

    Send Report Through E-mail Failure

    Snapshot chain height check

    Snapshots space utilization status

    Storage Pool SSD tier usage

    Stretch Connectivity Lost

    VM group Snapshot and Current Mismatch

    Zookeeper active on all CVMs

    Zookeeper fault tolerance check

    Zookeeper nodes distributed in multi-block cluster

    vDisk Count Check

    Cluster | Storage Capacity
    Erasure Code Configuration

    Erasure Code Garbage

    Erasure coding pending check

    Erasure-Code-Delay Configuration

    High Space Usage on Storage Container

    Storage Container RF Status

    Storage Container Space Usage

    StoragePool Space Usage

    Volume Group Space Usage

    Data Protection | Protection Domain
    Aged Third-party Backup Snapshot Check

    Check VHDX Disks

    Clone Age Check

    Clone Count Check

    Consistency Group Configuration

    Cross Hypervisor NGT Installation Check

    EntityRestoreAbort

    External iSCSI Attachments Not Snapshotted

    Failed To Mount NGT ISO On Recovery of VM

    Failed To Recover NGT Information

    Failed To Recover NGT Information for VM

    Failed To Snapshot Entities

    Incorrect Cluster Information in Remote Site

    Metadata Volume Snapshot Persistent

    Metadata Volume Snapshot Status

    Metro Availability

    Metro Availability Prechecks Failed

    Metro Availability Secondary PD sync check

    Metro Old Primary Site Hosting VMs

    Metro Protection domain VMs running at Sub-optimal performance

    Metro Vstore Symlinks Check

    Metro/Vstore Consistency Group File Count Check

    Metro/Vstore Protection Domain File Count Check

    NGT Configuration

    PD Active

    PD Change Mode Status

    PD Full Replication Status

    PD Replication Expiry Status

    PD Replication Skipped Status

    PD Snapshot Retrieval

    PD Snapshot Status

    PD VM Action Status

    PD VM Registration Status

    Protected VM CBR Capablity

    Protected VM Not Found

    Protected VMs Not Found

    Protected VMs Storage Configuration

    Protected Volume Group Not Found

    Protected Volume Groups Not Found

    Protection Domain Decoupled Status

    Protection Domain Initial Replication Pending to Remote Site

    Protection Domain Replication Stuck

    Protection Domain Snapshots Delayed

    Protection Domain Snapshots Queued for Replication to Remote Site

    Protection Domain VM Count Check

    Protection Domain fallback to lower frequency replications to remote

    Protection Domain transitioning to higher frequency snapshot schedule

    Protection Domain transitioning to lower frequency snapshot schedule

    Protection Domains sharing VMs

    Related Entity Protection Status

    Remote Site NGT Support

    Remote Site Snapshot Replication Status

    Remote Stargate Version Check

    Replication Of Deduped Entity

    Self service restore operation Failed

    Snapshot Crash Consistent

    Snapshot Symlink Check

    Storage Container Mount

    Updating Metro Failure Handling Failed

    Updating Metro Failure Handling Remote Failed

    VM Registration Failure

    VM Registration Warning

    VSS Scripts Not Installed

    VSS Snapshot Status

    VSS VM Reachable

    VStore Snapshot Status

    Volume Group Action Status

    Volume Group Attachments Not Restored

    Vstore Replication To Backup Only Remote

    Data Protection | Remote Site
    Automatic Promote Metro Availability

    Cloud Remote Operation Failure

    Cloud Remote Site failed to start

    LWS store allocation in remote too long

    Manual Break Metro Availability

    Manual Promote Metro Availability

    Metro Connectivity

    Remote Site Health

    Remote Site Network Configuration

    Remote Site Network Mapping Configuration

    Remote Site Operation Mode ReadOnly

    Remote Site Tunnel Status

    Data Protection | Witness
    Authentication Failed in Witness

    Witness Not Configured

    Witness Not Reachable

    File server | Host
    File Server Upgrade Task Stuck Check

    File Server VM Status

    Multiple File Server Versions Check

    File server | Network
    File Server Entities Not Protected

    File Server Invalid Snapshot Warning

    File Server Network Reachable

    File Server PD Active On Multiple Sites

    File Server Reachable

    File Server Status

    Remote Site Not File Server Capable

    File server | Services
    Failed to add one or more file server admin users or groups

    File Server AntiVirus – All ICAP Servers Down

    File Server AntiVirus – Excessive Quarantined / Unquarantined Files

    File Server AntiVirus – ICAP Server Down

    File Server AntiVirus – Quarantined / Unquarantined Files Limit Reached

    File Server AntiVirus – Scan Queue Full on FSVM

    File Server AntiVirus – Scan Queue Piling Up on FSVM

    File Server Clone – Snapshot invalid

    File Server Clone Failed

    File Server Rename Failed

    Maximum connections limit reached on a file server VM

    Skipped File Server Compatibility Check

    File server | Storage Capacity
    FSVM Time Drift Status

    Failed To Run File Server Metadata Fixer Successfully

    Failed To Set VM-to-VM Anti Affinity Rule

    File Server AD Connectivity Failure

    File Server Activation Failed

    File Server CVM IP update failed

    File Server DNS Updates Pending

    File Server Home Share Creation Failed

    File Server In Heterogeneous State

    File Server Iscsi Discovery Failure

    File Server Join Domain Status

    File Server Network Change Failed

    File Server Node Join Domain Status

    File Server Performance Optimization Recommended

    File Server Quota allocation failed for user

    File Server Scale-out Status

    File Server Share Deletion Failed

    File Server Site Not Found

    File Server Space Usage

    File Server Space Usage Critical

    File Server Storage Cleanup Failure

    File Server Storage Status

    File Server Unavailable Check

    File Server Upgrade Failed

    Incompatible File Server Activation

    Share Utilization Reached Configured Limit

    Host | CPU
    CPU Utilization

    Host | Disk
    All-flash Node Intermixed Check

    Host disk usage high

    NVMe Status Check

    SATA DOM 3ME Date and Firmware Status

    SATA DOM Guest VM Check

    SATADOM Connection Status

    SATADOM Status

    SATADOM Wearout Status

    SATADOM-SL 3IE3 Wearout Status

    Samsung PM1633 FW Version

    Samsung PM1633 Version Compatibility

    Samsung PM1633 Wearout Status

    Samsung PM863a config check

    Toshiba PM3 Status

    Toshiba PM4 Config

    Toshiba PM4 FW Version

    Toshiba PM4 Status

    Toshiba PM4 Version Compatibility

    Host | Hardware
    CPU Temperature Fetch

    CPU Temperature High

    CPU Voltage

    CPU-VRM Temperature

    Correctable ECC Errors 10 Days

    Correctable ECC Errors One Day

    DIMM Voltage

    DIMM temperature high

    DIMM-VRM Temperature

    Fan Speed High

    Fan Speed Low

    GPU Status

    GPU Temperature High

    Hardware Clock Status

    IPMI SDR Status

    SAS Connectivity

    System temperature high

    Host | Memory
    Memory Swap Rate

    Ram Fault Status

    Host | Network
    10 GbE Compliance

    Hypervisor IP Address Configuration

    IPMI IP Address Configuration

    Mellanox NIC Mixed Family check

    Mellanox NIC Status check

    NIC Flapping Check

    NIC Link Down

    Node NIC Error Rate High

    Receive Packet Loss

    Transmit Packet Loss

    Host | Services
    Datastore Remount Status

    Node | Disk
    Boot device connection check

    Boot device status check

    Descriptors to deleted files check

    FusionIO PCIE-SSD: ECC errors check

    Intel Drive: ECC errors

    Intel SSD Configuration

    LSI Disk controller firmware status

    M.2 Boot Disk change check

    M.2 Intel S3520 host boot drive status check

    M.2 Micron5100 host boot drive status check

    SATA controller

    SSD Firmware Check

    Samsung PM863a FW version check

    Samsung PM863a status check

    Samsung PM863a version compatibility check

    Samsung SM863 SSD status check

    Samsung SM863a version compatibility check

    Node | Hardware
    IPMI connectivity check

    IPMI sel assertions check

    IPMI sel log fetch check

    IPMI sel power failure check

    IPMI sensor values check

    M10 GPU check

    M10 and M60 GPU Mixed check

    M60 GPU check

    Node | Network
    CVM 10 GB uplink check

    Inter-CVM connectivity check

    NTP configuration check

    Storage routed to alternate CVM check

    Node | Protection Domain
    ESX VM Virtual Hardware Version Compatible

    Node | Services
    .dvsData directory in local datastore

    Advanced Encryption Standard (AES) enabled

    Autobackup check

    BMC BIOS version check

    CVM memory check

    CVM port group renamed

    Cassandra Keyspace/Column family check

    Cassandra memory usage

    Cassandra service restarts check

    Cluster Services Down Check

    DIMM Config Check

    DIMMs Interoperability Check

    Deduplication efficiency check

    Degraded Node check

    Detected VMs with non local data

    EOF check

    ESXi AHCI Driver version check

    ESXi APD handling check

    ESXi CPU model and UVM EVC mode check

    ESXi Driver compatibility check

    ESXi NFS hearbeat timeout check

    ESXi RAM disk full check

    ESXi RAM disk root usage

    ESXi Scratch Configuration

    ESXi TCP delayed ACK check

    ESXi VAAI plugin enabled

    ESXi VAAI plugin installed

    ESXi configured VMK check

    ESXi services check

    ESXi version compatibility

    File permissions check

    Files in a streched VMs should be in the same Storage Container

    GPU drivers installed

    Garbage egroups check

    Host passwordless SSH

    Ivy Bridge performance check

    Mellanox NIC Driver version check

    NFS file count check

    NSC(Nutanix Service Center) server FQDN resolution

    NTP server FQDN resolution

    Network adapter setting check

    Non default gflags check

    Notifications dropped check

    PYNFS dependency check

    RC local script exit statement present

    Remote syslog server check

    SMTP server FQDN resolution

    Sanity check on local.sh

    VM IDE bus check

    VMKNICs subnets check

    VMware hostd service check

    Virtual IP check

    Zookeeper Alias Check

    localcli check

    vim command check

    Nutanix Guest Tools | VM
    PostThaw Script Execution Failed

    Other Checks
    LWS Store Full

    LWS store allocation too long

    Recovery Point Objective Cannot Be Met

    VM | CPU
    CPU Utilization

    VM | Disk
    I/O Latency

    Orphan VM Snapshot Check

    VM | Memory
    Memory Pressure

    Memory Swap Rate

    VM | Network
    Memory Usage

    Receive Packet Loss

    Transmit Packet Loss

    VM | Nutanix Guest Tools
    Disk Configuration Update Failed

    VM Guest Power Op Failed

    iSCSI Configuration Failed

    VM | Remote Site
    VM Virtual Hardware Version Compatible

    VM | Services
    VM Action Status

    VM | Virtual Machine
    Application Consistent Snapshot Skipped

    NGT Mount Failure

    NGT Version Incompatible

    Temporary Hypervisor Snapshot Cleanup Failed

    VSS Snapshot Aborted

    VSS Snapshot Not Supported

    host | Network
    Hypervisor time synchronized

    Nov
    13

    NetBackup Got Upgraded: Modern Workloads use Parallel Streaming.

    A new capability in NetBackup 8.1 is its ability to protect modern web-scale and big data workloads like Hadoop and NoSQL that generate massive amounts of data. With the new Veritas Parallel Streaming technology, these modern scale-out workloads can be backed up and protected with extreme efficiency by leveraging the power of multiple nodes simultaneously. The result is that organizations can now adopt these modern workloads with confidence, knowing that their data, even in massive volumes, will be protected. And since new workloads can be added via a plug-in rather than a software agent, organizations can add new workloads without having to wait for a next NetBackup software release. NetBackup Parallel Streaming also supports workloads running on hyper-converged infrastructure (HCI) from Nutanix, as Nutanix and Veritas have partnered to certify protection of those workloads on HCI.

    source


    Netbackup takes care of utilizing all of the Nutanix storage controllers. NetBackup will mount using NFS of the local storage controller for each node removing the need for a proxy host.

    For some general best practices with the first release of this plugin:

      * Apply limit to backup maximum n*4 VMs concurrently, where n is number of nodes in the Nutanix cluster. 16 node cluster than would have 64 VMs being backed up concurrently.

      * Use Media Server as Backup Host if possible.

    Note: If VMs are powered off, then Prim element VIP will be used.

    Oct
    27

    Mounting and Enabling NGT Results in an Error Message….. CD/DVD-ROM Drive

    Nutanix has a KB ARTICLE 3455

    Mounting and Enabling NGT Results in an Error Message that Indicates that the VM does not have a CD/DVD-ROM Drive

    If you enable Nutanix Guest Tools (NGT) in the Prism web console, the following error message is displayed.

    This VM does not have a CD/DVD-ROM device.
    OR
    Guest Tools cannot be mounted as there not enough empty CD-ROM(s)

    This error message is displayed even though the VM has the CD/DVD device.

    You can go ahead and read the KB but its caused by newer VMware versions using a SATA controller instead IDE for the CD ROM. On my VM it kept switching back to SATA from IDE. I got around it by adding a 2nd CD-ROM that was IDE.