Implementing Docker event monitoring from scratch

docker events dashboard
\

Docker's API provides a ton of functionality around containers and images - but there is a hidden secret, one that is easy to miss in the documentation: Docker's API has the capability to report host wide events! Container events like: die, restart & out of memory. With a simple GET request, these events are available for processing.

I'll take a look at how we can tap into this functionality, and how we can convert raw data into meaningful dashboards and alerts.

A Simple Spike

First, I'll take a look at Docker's monitoring events API. From the documentation, I have two options: polling or streaming. I'll use streaming with a GET /events command. A successful 200 response returns a JSON string indicating the status, id, from & time of the containers reporting events. Here's a sample:

{“status”: “create”, “id”: “dfdf82bd3881”,“from”: “ubuntu:latest”, “time”:1374067924}

Luckily for us, Swipely's team has already released a great docker-api gem. It's a lightweight Ruby interface into the Docker API. The gem has a section on event streams and appears to do everything I need it to do. Let's install this gem and try it out.

After doing a gem install docker-api, I'll jump into IRB on my Vagrant VM and issue some commands.

Looks like everything is configured correctly and I have a great starting point.

Now, I want to check for events.

What did I do here? First, I set the :read_timeout parameter to 100 minutes. That will give us time to test events vs. timing out (default is one minute). Next, I set up a simple block of code to execute whenever an event arrives.

In a new terminal tab - let's fire up a container:

docker run -it ubuntu date

…and in our original tab - we've got events!

Just to test again - let's start a longer running container:

docker run -it ubuntu /bin/bash

…and then in yet another terminal window, shut it down.

docker stop 3e2f58981df8

How can I take this further? How about a StatsD counter? A counter's job is to collect metrics over an interval and once that interval is complete - report the count of the metrics collected. After installing the statsd-ruby gem, I've whipped up a little script.

What's the script doing? It's basically the same thing as the IRB commands above, except line #11 is stripping out some extra information. The Docker API reports exec_create and exec_start events - but adds the command that was executed (e.g. /bin/bash). I want to know how many exec_create events were reported, but I don't need to know the counts of each specific command.

Bringing it home

Ok, I have to come clean. There's one issue with this script. The statsd gem is going to send the metrics, but how am I going to collect them? Well, I could use something like Graphite, but this post would be a LOT longer. I'm anxious to see these metrics so I'll use Scout instead.

I'll do a quick install on my VM:

curl -Sso scout_install.sh https://scoutapm.com/scout_install.sh sudo /bin/bash ./scout_install.sh

Update my script to be executable:

chmod +x <SCRIPT_NAME>.rb

Run it, and fire off a few container events:

docker run -it ubuntu date docker rm

.…then go check my Scout account. Docker events are showing up!

What else is cool? I can now add those metrics to my dashboards:

What else? I can create alerts on them. The next time any of my containers decide to puke and die, I can get an SMS message about their doomed state.

So what's next? I could run this script on every one of my containers - but that's not really the Docker way. Instead, Docker recommends that we should create our own container running this script.

Wait a minute… since I'm already running Scout and I see that they've already got a Docker image for it - I'll just update the existing Scout container with the new script.

Now, all I have to do is follow the directions for starting docker-scout on my host, and everything is set to go.

TL; DR

The Docker Events API gives us a lot of visibility into the workings of a Dockerized-host. With 17 lines of Ruby, StatsD, and Scout, we've got monitoring + alerting on those events.

Also See

For application monitoring with errors, logs, and traces, Scout Monitoring provides the fastest insights without the bloat.