Skip to content

rhyolight/omg-monitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OMG Monitor

This program uses NuPIC to catch anomalies in Pingdom monitored response times. It runs as a Docker container, so to use it you just have to run the container (see Usage).

Here is a simplified flowchart of the project:

flowchart

Input

The inputs are response times gathered by Pingdom. We use the python-restful-pingdom module to get the Pingdom data used to train the NuPIC model. The model is first trained with the last 1440 response times of the specified checks, after which it starts learning online, making a request per minute per check to Pingdom.

Output

NuPIC calculates an anomaly score and an anomaly likelihood for each input, which are stored with some input fields in a Redis server. The data stored in Redis are lists of strings of the form "time,status,actual,predicted,anomaly,likelihood" with keys of the form "results:[CHECK_ID]".

API

The results can be accessed via a RESTful API written in Go using Martini.

It is very simplistic:

  • To get the servers available in the supplied Pingdom account:
/checks

The JSON string returned contains only the IDs and names of the checks.

  • To get the last [N] results for [CHECK_ID]:
/results/[CHECK_ID]?limit=[N]

The resulting JSON string has the following fields:

  • actual: Actual response time at the given time instant.
  • predicted: Response time prediction for the given time instant.
  • anomaly: the unlikelihood of the actual result in comparisson with the predicted result.
  • likelihood: the likelihood that the last anomaly score follows the historical probability distribution.
  • status: status of the server as returned by Pingdom (up, down, unconfirmed_down).
  • time: UNIX time when Pingdom got the actual response time. If no limit is specified it is assumed that N=0, so that the API returns all the results for the given CHECK_ID.

API Client

The Go server also serves static HTML files that uses jQuery to access our API to get the results and dinamically plot them. Currently we have three visualizations:

See the session Screenshots for some examples.

Usage

With Docker installed, do:

sudo docker run allanino/monitor -d -p [PUBLIC_PORT]:5000 [USERNAME] [PASSWORD] [APPKEY] [CHECK_ID_1] [CHECK_ID_2] ...

The parameters that we must specify:

  • [PUBLIC_PORT]: Public port number used by the Go server.

  • [USERNAME] [PASSWORD] [APPKEY]: Pingdom's credentials: username, password and app-key.

  • (Optional) [CHECK_ID_1] [CHECK_ID_2] ...: Pingdom's IDs to monitor. If we don't specify any IDs, the monitor will use all available IDs from the given Pingdom account.

Q&A

What happens when the container is started?

The Docker container entrypoint is the script startup.sh, responsible for starting the Redis and Go servers and for running the start.py script, which will start one monitor.py instance for each check, each one in a separate thread (running in parallel). The script monitor.py contains the NuPIC code. The start.py script will also fetch the available checks from the given Pingdom account and save them in the Redis server with key "checks" and value as a list of strings with the IDs. It will also create keys of the form "check:[CHECK_ID]" that stores the corresponding check's name. All that information is used by our API.

As soon as the Go server is available, we can see the logs generated by Redis, Martini and the monitors in the URL /log.

How to build the image?

The above Docker command run will pull the image allanino/monitor from Docker index. That image is kept updated through Docker's Trusted Build feature.

To build the Docker image locally, clone this repository and do:

sudo docker build -t "[USERNAME]/monitor" .

Note that our Dockerfile uses the [numenta/nupic] image, as that image already contains a NuPIC installation.

Screenshots

gauge

anomaly

likelihood

License

OMG Monitor
Copyright (C) 2014 Cloudwalk

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

About

A Docker container that runs NuPIC to catch anomalies in response times of servers monitored by Pingdom.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published