Running IT: Using Docker to Scale Operational Intelligence at Splunk

I am trying to get back into running and would like to complete a marathon at some point but I am taking it slow. Last time I tired, I got up to 36 KM but knees and feet didn’t fare so well. With that being said I am going to have some time on the treadmill and elliptical and one way I can be of service is to tell you the important parts of video’s I watch and hopefully give you back some time as well. The first video I picked was Using Docker to Scale Operational Intelligence at Splunk from Dockercon 17 Europe.

I was kinda hoping for Splunk and Docker Integration but it was more about how Splunk was using Docker. Interestingly Splunk does have a good use case for needing both Windows and Linux nodes for testing. When I first heard that Docker could have both Linux and Windows hosts in the same Swarm cluster I thought that was cool but didn’t really know how much it would be used.

Skip to 21:33
– First half is mainly review of Docker EE and why Splunk picked Docker. I am sure most have heard similar stories. There is mention of RBAC for Nodes which allows secure multi-tenancy across multiple teams through node-based isolation and segregation. At the time of the video Splunk wasn’t using it but would have made life easier.

Interesting Notes for the Session

Started with a Bare-metal test lab of 150 nodes using UCP. Now running over 600 servers.

Splunk 7 was a new feature, Metrics. Metrics is a for system administrators and IT tools engineers that focuses on collecting, investigating, monitoring, and sharing metrics from your technology infrastructure, security systems, and business applications in real time. Splunk is using Collectd to get data into Splunk and also grabs the logs and search it from the same interface.

Splunk using 1 container per server for performance testing.

Or test/dev testing Splunk uses 20-30 containers per server.

Running a complicated system to make sure performance and test/dev containers don’t intermix. Splunk is hoping to use the new RBAC for nodes and the CPU/Memory reservations to clean up the CI workflow.

Moving away for Jenkins for CI. Using UCP to move away from agents to run over 2,000 concurrent jobs.

Speak Your Mind