Oct
17

My Thoughts On The Kubernetes Support For Docker EE

Oct
02

Automatically Snap, Clone and Backup AFS (Acropolis File Services)

I wrote a script on the Next community site that automatically snaps, clones and then you can use any backup product that can read off a SMB share. The script can be used to always have the latest backup copy and avoid impacting your production users.

Automatically https://next.nutanix.com/t5/Nutanix-Connect-Blog/24-Hour-Backup-Window-with-Nutanix-Native-File-Services/ba-p/23708

Hope you find it useful.

Sep
25

Maximize Your ROI and Agility with Big Data #Nutanix #Docker #BlueData

Separate out your data from your compute for more agility.

The datanode is what is used to build out the HDFS. Typically the the dataNode and the nodeManager are co-located on the same host whether its physical or virtual. The NodeManager is responsible for launching and managing containers that are scheduled from the Resource Manager. On Nutanix if you virtualize the dataNode and the nodeManager on separate virtual machines you have the opportunity to increase your agility. The agility comes from the ability to use your resources to the max of your capacity at all times. When the the cluster isn’t in use or as busy, other systems have the opportunity to use the resources. You can shut down the NodeManager since they’re not responsible for persisting data and make the the CPU and memory available for another project like Spark or maybe a new machine-learning program someone wants to test out.

Hold the phone! What about data locality? You are correct performance is going to take a hit. Performance may drop from up to 15% from the standard way but if your system is only busy 30% of time it might be more that worth it. Let’s say a job takes 60 minutes to complete. Using this new model of separating out compute and storage, the job may now take 70 minutes to complete. Is the extra 10 minutes worth the agility to use your hardware for other projects? I think so but that is going to depend on your business requirements of course.

On the data locality side, the datenode still gets to benefit from reading locally. It’s data path on the network isn’t going to cause more stress so that’s a plus. Also the nodeManager is busy writing all of the temporary and shuffle data locally so that is also not going to cause any additional stress compared to having the nodeManager write to a remote shared storage device. Also in some cases the NodeManager will still talk to the local datanode over the local hypervisor switch.

If your after some real flexibility you could look at using BlueData to run Docker containers along side the dataNodes. BlueData will take over for the nodeManager essentially. Install some CentOS VMs that fit inside the hosts NUMA node and install BlueData. BlueData can help with QofS for different tenants, allow you to run different versions of Hadoop distros, Spark, Kafka and son on without blowing out your data requirements. BlueData also helps to maximize the remote connection between the containers and HDFS distro of choice.

If your after more agility, avoiding separate hardware for projects, getting better ROI for systems that run only weekly, monthly, quarterly or better testing methodologies this may be the right architecture for you to try out.

Sep
15

HYCU v1.5 – Nutanix AHV Backup gets new features

HYCU v1.5 has been released by Comtrade.

The biggest one for me is ABS support! Know you can use cheap and deep storage and drive all of the storage controllers. Remember that ADS works with ABS so its a great solution.

The following new features and enhancements are available with Hycu version 1.5.0:

Backup window
It is now possible to set up a time frame when your backup jobs are allowed to run (a backup window). For example, this allows you to schedule your backup jobs to run on non-production hours to reduce loads during peak hours.

Concurrent backups
You can now specify the maximum number of concurrent backup jobs per target. By doing so, you can reduce the duration of backups and the amount of queued backup jobs.

Faster recovery
Hycu enables you to keep snapshots from multiple backups on the Nutanix cluster. Keeping multiple snapshots allows you to recover a virtual machine or an application quickly, reducing downtime. If your know Commvault Intelisnap, very similar benefits

iSCSI backup target
A new type of backup target for storing the protected data is available—iSCSI, which also makes it possible for you to use a Nutanix volume group as a backup target. You can use the iSCSI backup target for backing up the Hycu backup controller as well.

Improved backup and restore performance
Hycu now utilizes Nutanix volume groups for backing up and restoring data, taking advantage of the load balancing feature offered by Nutanix. Therefore, the new version of Hycu can distribute the workload between several nodes, which results in increased
performance of your backup and restore operations, and reduced I/O load on the Nutanix cluster and containers.

Support for AWS S3-compatible storage
Hycu enables you to store your protected data to AWS S3-compatible storage.

Shared location for restoring individual files
You can now restore individual files to a shared location so that recovered data can be accessed from multiple systems.

Support for Active Directory applications
In addition to SQL Server applications, Hycu can now also detect and protect Active Directory applications running on virtual machines. You can view all the discovered applications in the Applications panel.

Expiring backups manually

If there is a restore point that you do not want to use for a data restore anymore, you can mark it as expired. Expired backups are removed from the backup target within the next 24 hours, resulting in more free storage space and helping you to keep your Hycu system clean.

Support for Nutanix API changes

Hycu supports Nutanix API changes introduced with AOS 5.1.1.1

Sep
14

AOS 5.1.2 Security Updates

A long list of updates, one-click upgrade yourself to safety.

CVE-2017-1000364 kernel: heap or stack gap jumping occurs through unbounded stack allocations ( Stack Guard or Stack Clash)

CVE-2017-1000366 glibc: heap or stack gap jumping occurs through unbounded stack allocations (Stack Guard or Stack Clash)

CVE-2017-2628 curl: negotiate not treated as connection-oriented

CVE-2017-3509 OpenJDK: improper re-use of NTLM authenticated connections (Networking, 8163520)

CVE-2017-3511 OpenJDK: untrusted extension directories search path in Launcher (JCE, 8163528)

CVE-2017-3526 OpenJDK: incomplete XML parse tree size enforcement (JAXP, 8169011)

CVE-2017-3533 OpenJDK: newline injection in the FTP client (Networking, 8170222)

CVE-2017-3539 OpenJDK: MD5 allowed for jar verification (Security, 8171121)

CVE-2017-3544 OpenJDK: newline injection in the SMTP client (Networking, 8171533)

CVE-2016-0736 httpd: Padding Oracle in Apache mod_session_crypto

CVE-2016-1546 httpd: mod_http2 denial-of-service by thread starvation

CVE-2016-2161 httpd: DoS vulnerability in mod_auth_digest

CVE-2016-8740 httpd: Incomplete handling of LimitRequestFields directive in mod_http2

CVE-2016-8743 httpd: Apache HTTP Request Parsing Whitespace Defects

CVE-2017-8779 rpcbind, libtirpc, libntirpc: Memory leak when failing to parse XDR strings or bytearrays

CVE-2017-3139 bind: assertion failure in DNSSEC validation

CVE-2017-7502 nss: Null pointer dereference when handling empty SSLv2 messages

CVE-2017-1000367 sudo: Privilege escalation in via improper get_process_ttyname() parsing

CVE-2016-8610 SSL/TLS: Malformed plain-text ALERT packets could cause remote DoS

CVE-2017-5335 gnutls: Out of memory while parsing crafted OpenPGP certificate

CVE-2017-5336 gnutls: Stack overflow in cdk_pk_get_keyid

CVE-2017-5337 gnutls: Heap read overflow in read-packet.c

CVE-2017-1000366 glibc: heap/stack gap jumping via unbounded stack allocations

CVE-2017-1000368 sudo: Privilege escalation via improper get_process_ttyname() parsing

CVE-2017-3142 bind: An error in TSIG authentication can permit unauthorized zone transfers

CVE-2017-3143 bind: An error in TSIG authentication can permit unauthorized dynamic updates

CVE-2017-10053 OpenJDK: reading of unprocessed image data in JPEGImageReader (2D, 8169209)

CVE-2017-10067 OpenJDK: JAR verifier incorrect handling of missing digest (Security, 8169392)

CVE-2017-10074 OpenJDK: integer overflows in range check loop predicates (Hotspot, 8173770)

CVE-2017-10078 OpenJDK: Nashorn incompletely blocking access to Java APIs (Scripting, 8171539)

CVE-2017-10081 OpenJDK: incorrect bracket processing in function signature handling (Hotspot, 8170966)

CVE-2017-10087 OpenJDK: insufficient access control checks in ThreadPoolExecutor (Libraries, 8172204)

CVE-2017-10089 OpenJDK: insufficient access control checks in ServiceRegistry (ImageIO, 8172461)

CVE-2017-10090 OpenJDK: insufficient access control checks in AsynchronousChannelGroupImpl (8172465, Libraries)

CVE-2017-10096 OpenJDK: insufficient access control checks in XML transformations (JAXP, 8172469)

CVE-2017-10101 OpenJDK: unrestricted access to com.sun.org.apache.xml.internal.resolver (JAXP, 8173286)

CVE-2017-10102 OpenJDK: incorrect handling of references in DGC (RMI, 8163958)

CVE-2017-10107 OpenJDK: insufficient access control checks in ActivationID (RMI, 8173697)

CVE-2017-10108 OpenJDK: unbounded memory allocation in BasicAttribute deserialization (Serialization, 8174105)

CVE-2017-10109 OpenJDK: unbounded memory allocation in CodeSource deserialization (Serialization, 8174113)

CVE-2017-10110 OpenJDK: insufficient access control checks in ImageWatched (AWT, 8174098)

CVE-2017-10111 OpenJDK: incorrect range checks in LambdaFormEditor (Libraries, 8184185)

CVE-2017-10115 OpenJDK: DSA implementation timing attack (JCE, 8175106)

CVE-2017-10116 OpenJDK: LDAPCertStore following referrals to non-LDAP URLs (Security, 8176067)

CVE-2017-10135 OpenJDK: PKCS#8 implementation timing attack (JCE, 8176760)

CVE-2017-10193 OpenJDK: incorrect key size constraint check (Security, 8179101)

CVE-2017-10198 OpenJDK: incorrect enforcement of certificate path restrictions (Security, 8179998)

Sep
14

Acropolis Dynamic Scheduler (ADS) for AHV (Compute + Memory + Storage)

The Acropolis Dynamic Scheduler (ADS) ensures that compute (CPU and RAM) and storage resources are available for VMs and volume groups (VGs) in the Nutanix cluster. ADS, enabled by default, uses real-time statistics to determine:

Initial placement of VMs and VGs, specifically which AHV host runs a particular VM at power-on or a particular VG after creation.

Required runtime optimizations, including moving particular VMs and VGs to other AHV hosts to give all workloads the best possible access to resources.

If a problem is detected, a migration plan is created and executed thereby eliminating hotspots in the cluster by migrating VMs from one host to another. This feature only detects the contentions that are currently in progress. You can monitor these tasks from the Task dashboard of the Prism Web console. You can click the VM link to view the migration information, which includes the migration path (to the destination AHV host).

The Acropolis block services feature uses the ADS feature for balancing sessions of the externally visible iSCSI targets.

Sep
14

Nutanix AFS and SMB 3.0

After you upgrade using the AFS (Acropolis File Services)2.2 bits you will have to manually change the max allowed protocol. This will be fixed with AFS 2.2.1 but here are the steps to get you going.

The following commands listed below will set the max protocol to the proper version

scli smbcli get --section global --param "server max protocol"
[global]
server max protocol = SMB2

scli smbcli set --section global --param "server max protocol" --value SMB3_00
smb.conf update is successful

scli smbcli get --section global --param "server max protocol"
[global]
server max protocol = SMB3_00

Also note if you want to run FSlogix and need an AFS share to have a block size of 512 bytes, this can be done. Default is 1024.

Sep
07

Windows Get Some Love with #Docker EE 17.06

With the new release of Docker 17.06 EE Windows containers gets lots of added features. First up is the ability to run Windows and Linux worker nodes in the same same cluster. This is great because you have centralized security and logging across your whole environment. Your .NET and Java teams can live in peace to consolidate your infrastructure instead of spinning of separate environments.

Continuously scanning for vulnerabilities in Windows images was added if your have Advanced EE license. Not only does it scan images it will also alert when new vulnerabilities are found in existing images.

Bringing everything together you can use the same overlay networks to connect your application in the case of SQL server and web servers running on Linux. Your developers can create a single compose file covering both SQL and web severs.

Other New Windows related features in Docker 17.06:

Windows Server 2016 support
Windows 10586 is marked as deprecated; it will not be supported going forward in stable releases
Integration with Docker Cloud, with the ability to control remote Swarms from the local command line interface (CLI) and view your repositories
Unified login between the Docker CLI and Docker Hub, Docker Cloud.
Sharing a drive can be done on demand, the first time a mount is requested
Add an experimental DNS name for the host: docker.for.win.localhost
Support for client (i.e. “login”) certificates for authenticating registry access (fixes docker/for-win#569)
New installer experience

Sep
06

Nutanix Native File Services (AFS) Now Supports AV Offload Scanning

With AFS 2.2 and AOS 5.1.2 now supports ICAP(Internet Content Adaptation Protocol), which is supported by a wide range of security vendors and products, is a standard protocol that allows file and web servers to be integrated with security products. Nutanix chose this method to give customers the ability to choose the antivirus solution that works best for their specific environment.

Following is the workflow for an ICAP-supported antivirus solution:
An SMB client submits a request to open or close a file.
The file server determines if the file needs to be scanned, based on the metadata and virus scan policies. If a scan is needed, the file server sends the file to the ICAP server and issues a scan request.
The ICAP server scans the file and reports the scan results back to the file server.
The file server takes an action based on the scan results:
If the file is infected, the file server quarantines it and returns an “access denied” message to the SMB client.
If the file is clean, it returns the file handle to the SMB client.

The ICAP service runs on each AFS file server and can interact with more than one ICAP server in parallel to support horizontal scale-out of the antivirus server. We recommend configuring two or more ICAP servers for production. The scale-out nature of AFS and one-click optimization greatly mitigates any antivirus scanning performance overhead. If the scanning affects AFS file server VM performance, one-click optimization recommends increasing the virtual CPU resources or scaling out the file server VMs. This feature also allows both the ICAP server and AFS to scale out, ensuring fast responses from the customer’s antivirus vendor.

AFS sets scanning defaults across the entire file server, but they are disabled by default per share when you enable file scanning. You can enable scan on write and scan on read. Scan on write begins when the file is closed, and scan on read occurs when the file is opened. You can also exclude certain file types and files over a certain size. Share scan polices can override any defaults set for the file server.

For each ICAP server, we spin up no more than 10 parallel connections per FSVM and randomly dispatch the file scanning among all the ICAP servers. With heavier workloads, which may encounter many scan requests and use all connections, the scan servers with more processing power scan more files. As soon as the current scan finishes, the next file is picked up from the queue, which keeps the number of active connections at 10.

Once AFS quarantines a file, the admin can rescan, unquarantine, or delete the file. Quarantined files can be searched if it is necessary to restore a file quickly.
If your antivirus vendor doesn’t support ICAP, you can scan the shares by installing an antivirus agent onto a Windows machine and then mounting all the shares from the file server. This approach allows you to schedule scans during periods of low usage. At the desktop or client level, you can set your antivirus solution to scan on write or scan only when files are modified. You can configure high-security environments to scan inline for both reads and writes.

Sep
04

Multi-stage build support in #Docker EE 17.06

New in Docker EE 17.06 is the ability to have multi-stage builds. This is important because you can now just grab the files(artifacts) you need for the next stage of your build and keep your builds small which leads to faster build times. This change allows you to have mutiple from arguments in your docker file.

Devs have optional give a name the build stage. Then afterward this name can be used in COPY –from=name src dest and FROM name. If a build stage is defined with that name it takes precedence in these commands, if it is not found, an image with that name is attempted to be used instead.

FROM node AS test-env
ADD ./ /app
WORKDIR /app
RUN npm install
RUN npm run build

FROM nginx AS prod
COPY --from=test-env /app /var/www/html

You can run subsets of the dockerfile to get more use of out your work. If you wanted to only run the test-env section you can add a target to the docker build command with –target test-env