Nov
22

What are all those lines on my screen? – Nutanix Analysis & Stats Unplugged.

The Analysis dashboard allows you to create charts that can monitor dynamically a variety of performance measures. The graphic nature eases troubleshooting using the right side of your brain to visually see problems that are correlated with metrics, events and alerts. To view the Analysis dashboard, simply select Analysis from the pull-down list on the far left of the main menu inside of Prism. You can also click on any performance graph inside of Prism and it will take you there and load that graph into the Analysis section.

Layer different metrics with events & alerts to see how healthy your environment is doing.

Layer different metrics with events & alerts to see how healthy your environment is doing.

You can create any number of charts. There are two types of charts, metric and entity. A metric chart monitors a single metric for one or more entities. An entity chart monitors one or more metrics for a single entity. As an example, a metric is something like IOPS and entity could be a VM.

There are over 50 Metrics you can track on Nutanix side.

Sampling
The analysis page pulls data at different sampling interval based of the requested time range. If the time range is huge, the sampling interval is bigger too.

The Range Picker displays a time line that sets the duration for the monitor displays. To set the time interval, click the time period (1 hour, 1 day, 1 week, WTD [week to date], 1 month) from the list at the top. To customize the monitor duration, drag the time line end points to the desired times on the time line.

Below shows based on a time series the sampling that is used:

3 hours – 30 secs
6 hours – 30 secs
1 day – 5 mins
1 week – 15 mins
WTD – 15 mins
1 month – 30 mins

The stats are manged by Arithmos in Prism and persisted in the Nutanix implementation of NoSQL Cassandra. The amount of space is propotioanl to the number of VMs and virtual disks. A cluster with 100 VMs and 100 virtual disks should have around 2GB per node. Arithmos compresses the stats in Cassandra. Therefore, the number should be lower. The stats in Cassandra is stored for 90 days. This is controlled by a gflag so it can be changed if needed but the best option is use Prism Central to store stats longer. Customers can also access the stats through REST APIs and then store them somewhere else if needed.

It’s important to note that if you’re not using the Acropolis hypervisor and you have run a tool for management like vCenter or SCCVM that there is no impact on these tools. Nutanix stat collection will not make vCenter slower as an example. All stats regardless of hyperviosr are taken directly from the host. There is no separate database or license that is needed to maintain performance monitoring.

You can use the REST-API explorer to help build your code.

You can use the REST-API explorer to help build your code.

Why are my IO Stats different?

A variety of statistics are displayed in the web console screens. There are three possible sources for a statistic:

Hypervisor. When the hypervisor can provide usage statistics, those figures are displayed where appropriate. ESXi provides such statistics, but Hyper-V and AHV do not. Getting the statistics from ESXi means numbers displayed in the web console should match the corresponding ones in vCenter.

Controller (Stargate). When hypervisor statistics are unavailable or inappropriate, the Controller VM provides statistics from Stargate (see Cluster Components). Controller-reported statistics might differ from those reported by the hypervisor for the following reasons:

    An NFS client might break up large I/O requests into smaller I/Os before issuing them to the NFS server, thus increasing the number of operations reported by the controller.
    The hypervisor might read I/Os from the cache in the hypervisor, which are not counted by the controller.

Disk (Stargate). Stargate can provide statistics from both the controller and disk perspective. The difference is that the controller perspective includes read I/Os from memory as well as disk I/Os, while the disk perspective includes just the disk I/Os.

The Analysis section is included with every license.

Chart Metrics

These metrics can be added to charts.

Metric Description
Hypervisor IOPS Input/Output operations per second from Hypervisor.
ID: STATS_HYP_NUM_IOPS

Hypervisor IOPS – Read Input/Output read operations per second from Hypervisor.
ID: STATS_HYP_NUM_READ_IOPS

Hypervisor IOPS – Write Input/Output write operations per second from Hypervisor.
ID: STATS_HYP_NUM_WRITE_IOPS

Disk IOPS Input/Output operations per second from disk.
ID: STATS_NUM_IOPS

Disk IOPS – Read Input/Output read operations per second from disk.
ID: STATS_NUM_READ_IOPS

Disk IOPS – Write Input/Output write operations per second from disk.
ID: STATS_NUM_WRITE_IOPS

Read IOPS (%) Percent of IOPS that are reads.
ID: STATS_READ_IO_PPM

Write IOPS (%) Percent of IOPS that are writes.
ID: STATS_WRITE_IO_PPM

Hypervisor I/O Bandwidth Data transferred per second in KB/second from Hypervisor.
ID: STATS_HYP_BANDWIDTH

Hypervisor I/O Bandwidth – Read Read data transferred per second in KB/second from Hypervisor.
ID: STATS_HYP_READ_BANDWIDTH

Hypervisor I/O Bandwidth – Write Write data transferred per second in KB/second from Hypervisor.
ID: STATS_HYP_WRITE_BANDWIDTH

Disk I/O Bandwidth Data transferred per second in KB/second from disk.
ID: STATS_BANDWIDTH

Disk I/O Bandwidth – Read Read data transferred per second in KB/second from disk.
ID: STATS_READ_BANDWIDTH

Disk I/O Bandwidth – Write Write data transferred per second in KB/second from disk.
ID: STATS_WRITE_BANDWIDTH

Hypervisor I/O Latency I/O latency in milliseconds from Hypervisor.
ID: STATS_HYP_AVG_IO_LATENCY

Hypervisor I/O Latency – Read I/O read latency in milliseconds from Hypervisor.
ID: STATS_HYP_AVG_READ_IO_LATENCY

Hypervisor I/O Latency – Write I/O write latency in milliseconds from Hypervisor.
ID: STATS_HYP_AVG_WRITE_IO_LATENCY

Disk I/O Latency I/O latency in milliseconds from disk.
ID: STATS_AVG_IO_LATENCY

Hypervisor CPU Usage (%) Percent of CPU used by the hypervisor.
ID: STATS_HYP_CPU_USAGE

Hypervisor Memory Usage (%) Percent of memory used by the hypervisor.
ID: STATS_HYP_MEMORY_USAGE

Transformed Usage Actual usage of storage.
ID: STATS_TRANSFORMED_USAGE

Untransformed Usage Logical usage of storage (before compression/deduplication).
ID: STATS_UNTRANSFORMED_USAGE

Replication Bytes – Transmitted Number of bytes transmitted.
ID: STATS_REP_NUM_TRANSMITTED_BYTES

Replication Bytes – Total Transmitted Total number of bytes transmitted.
ID: STATS_REP_TOT_TRANSMITTED_BYTES

Replication Bytes – Received Number of bytes received.
ID: STATS_REP_NUM_RECEIVED_BYTES

Replication Bytes – Total Received Total number of bytes received.
ID: STATS_REP_TOT_RECEIVED_BYTES

Replication Bandwidth – Transmitted Replication data transferred per second in KB/second
ID: STATS_REP_BW_TRANSFERRED

Replication Bandwidth – Received Replication data received per second in KB/second
ID: STATS_REP_BW_RECEIVED

Storage Controller IOPS Input/Output operations per second from the Storage Controller
ID: STATS_CONTROLLER_NUM_IOPS

Storage Controller IOPS – Read Input/Output read operations per second from the Storage Controller
ID: STATS_CONTROLLER_NUM_READ_IOPS

Storage Controller IOPS – Write Input/Output write operations per second from the Storage Controller
ID: STATS_CONTROLLER_NUM_WRITE_IOPS

Storage Controller Bandwidth Data transferred in KB/second from the Storage Controller.
ID: STATS_CONTROLLER_BANDWIDTH

Storage Controller Bandwidth – Read Read data transferred in KB/second from the Storage Controller.
ID: STATS_CONTROLLER_READ_BANDWIDTH

Storage Controller Bandwidth – Write Write data transferred in KB/second from the Storage Controller.
ID: STATS_CONTROLLER_WRITE_BANDWIDTH

Storage Controller IOPS – Read (%) Percent of Storage Controller IOPS that are reads.
ID: STATS_CONTROLLER_READ_IO_PPM

Storage Controller IOPS – Write (%) Percent of Storage Controller IOPS that are writes.
ID: STATS_CONTROLLER_WRITE_IO_PPM

Storage Controller Latency I/O latency in milliseconds from the Storage Controller.
ID: STATS_CONTROLLER_AVG_IO_LATENCY

Storage Controller Latency – Read Storage Controller read latency in milliseconds.
ID: STATS_CONTROLLER_AVG_READ_IO_LATENCY

Storage Controller Latency – Write Storage Controller write latency in milliseconds.
ID: STATS_CONTROLLER_AVG_WRITE_IO_LATENCY

Content Cache Hits Number of hits on the content cache.
ID: CONTENT_CACHE_NUM_HITS

Content Cache Lookups Number of lookups on the content cache.
ID: CONTENT_CACHE_NUM_LOOKUPS

Content Cache Hit Rate (%) Content cache hits over all lookups.
ID: CONTENT_CACHE_HIT_PPM

Content Cache Reference Count Average number of content cache references.
ID: CONTENT_CACHE_NUM_DEDUP_REF_COUNT_PPH

Content Cache Physical Memory Usage Real memory (in bytes) used to cache data by the content cache.
ID: CONTENT_CACHE_PHYSICAL_MEMORY_USAGE_BYTES

Content Cache SSD Usage Real SSD usage (in bytes) used to cache data by the content cache.
ID: CONTENT_CACHE_PHYSICAL_SSD_USAGE_BYTES

Content Cache Logical Memory Usage Logical memory (in bytes) used to cache data without deduplication.
ID: CONTENT_CACHE_LOGICAL_MEMORY_USAGE_BYTES

Content Cache Logical SSD Usage Logical SSD memory (in bytes) used to cache data without deduplication.
ID: CONTENT_CACHE_LOGICAL_SSD_USAGE_BYTES

Content Cache Memory Saved Memory (in bytes) saved due to content cache deduplication.
ID: CONTENT_CACHE_SAVED_MEMORY_USAGE_BYTES

Content Cache SSD Usage Saved SSD usage (in bytes) saved due to content cache deduplication.
ID: CONTENT_CACHE_SAVED_SSD_USAGE_BYTES

Deduplication Fingerprints Cleared Number of written bytes for which fingerprints have been cleared.
ID: DEDUP_FINGERPRINT_CLEARED_BYTES

Deduplication Fingerprints Written Number of written bytes for which fingerprints have been added.
ID: DEDUP_FINGERPRINT_ADDED_BYTES

Speak Your Mind

*