Logstash Filter Verifier

The Logstash program for collecting and processing logs from is popular and commonly used to process e.g. syslog messages and HTTP logs.

Apart from ingesting log events and sending them to one or more destinations it can transform the events in various ways, including extracting discrete fields from flat blocks of text, joining multiple physical lines into singular logical events, parsing JSON and XML, and deleting unwanted events. It uses its own domain-specific configuration language to describe both inputs, outputs, and the filters that should be applied to events.

Writing the filter configurations necessary to parse events isn't difficult for someone with basic programming skills, but verifying that the filters do what you expect can be tedious; especially when you tweak existing filters and want to make sure that all kinds of logs will continue to be processed as before. If you get something wrong you might have millions of incorrectly parsed events before you realize your mistake.

This is where Logstash Filter Verifier comes in. In lets you define test case files containing lines of input together with the expected output from Logstash. Pass one of more such test case files to Logstash Filter Verifier together with all of your Logstash filter configuration files and it'll run Logstash for you and verify that Logstash actually return what you expect.

Before you can run Logstash Filter Verifier you need to compile it. After covering that, let's start with a simple example and follow up with reference documentation.

Building and installing

This program is written in the Go language and needs to be compiled before it can be run. Go compilers are available for most platforms that Logstash runs on, including Windows. Many Linux distributions make some version of the Go compiler easily installable, but otherwise you can download and install the latest version. You should be able to compile the source code with any reasonable up to date version of the Go compiler. The build process also requires GNU make.

To download and compile the source, run these commands (pick another directory name if you like):

$ mkdir ~/go
$ cd ~/go
$ export GOPATH=$(pwd)
$ go get github.com/magnusbaeck/logstash-filter-verifier
$ cd src/github.com/magnusbaeck/logstash-filter-verifier
$ make

If successful you'll find an executable in the current directory. The two last commands can be replaced with an invocation of the makefile with make.

The makefile can also be used to install Logstash Filter Verifier centrally, by default in /usr/local/bin but you can change that by modifying the PREFIX variable. For example, to install it in $HOME/bin (which is probably in your shell's path) you can issue the following command:

$ make install PREFIX=$HOME

Examples

The examples that follow build upon each other and do not only show how to use Logstash Filter Verifier to test that particular kind of log. They also highlight how to deal with different features in logs.

Syslog messages

Logstash is often used to parse syslog messages, so let's use that as a first example.

Test case files are in JSON format and contain a single object with about a handful of supported properties.

{
  "fields": {
    "type": "syslog"
  }
  "input": [
    "Oct  6 20:55:29 myhost myprogram[31993]: This is a test message"
  ],
  "expected": [
    {
      "@timestamp": "2015-10-06T20:55:29.000Z",
      "host": "myhost",
      "message": "This is a test message",
      "pid": 31993,
      "program": "myprogram",
      "type": "syslog"
    }
  ]
}

In this example, type is set to "syslog" which means that the input events in this test case will have that in their type field when they're passed to Logstash. Next, in input, we define a single test string that we want to feed through Logstash, and the expected array contains a one-element object with the event we expect Logstash to emit for the given input.

Note that UTC is the assumed timezone for input events to avoid different behavior depending on the timezone of the machine where Logstash Filter Verifier happens to run. This won't affect time formats that include a timezone.

This command will run this test case file through Logstash Filter Verifier (replace all "path/to" with the actual paths to the files, obviously):

$ path/to/logstash-filter-verifier path/to/syslog.json path/to/filters

If the test is successful, Logstash Filter Verifier will terminate with a zero exit code and (almost) no output. If the test fails it'll run diff -u to compare the pretty-printed JSON representation of the expected and actual events.

The actual event emitted by Logstash will contain a @version field, but since that field isn't interesting it's ignored by default when reading the actual event. Hence we don't need to include it in the expected event either. Additional fields can be ignored with the ignore array property in the test case file (see details below).

JSON messages

I always prefer to configure application to emit JSON objects whenever possible so that I don't have to write complex and/or ambiguous grok expressions. Here's an example:

{"message": "This is a test message", "client": "127.0.0.1", "host": "myhost", "time": "2015-10-06T20:55:29Z"}

When you feed events like this to Logstash it's likely that the input used will have its codec set to "json". This is something we should mimic on the Logstash Filter Verifier side too. Use codec for that:

{
  "fields": {
    "type": "app"
  }
  "codec": "json",
  "ignore": ["host"],
  "input": [
    "{\"message\": \"This is a test message\", \"client\": \"127.0.0.1\", \"time\": \"2015-10-06T20:55:29Z\"}"
  ],
  "expected": [
    {
      "@timestamp": "2015-10-06T20:55:29.000Z",
      "client": "localhost",
      "clientip": "127.0.0.1",
      "message": "This is a test message",
      "type": "app"
    }
  ]
}

There are a few points to be made here:

The double quotes inside the string must be escaped.
The filters being tested here use Logstash's dns filter to transform the IP address in the "client" field into a hostname and copy the original IP address into the "clientip" field. To avoid future problems and flaky tests, pick a hostname or IP address for the test case that will always resolve to the same thing. As in this example, localhost and 127.0.0.1 should be safe picks.
If the input event doesn't contain a host field, Logstash will add such a field containing the name of the current host. To avoid test cases that behave differently depending on the host where they're run, we ignore that field with the ignore property.

Test case file reference

Test case files are JSON files containing a single object. That object may have the following properties:

codec: A string value naming the Logstash codec that should be used when events are read. This is normally "plain" or "json".
expected: An array of JSON objects with the events to be expected. They will be compared to the actual events produced by the Logstash process.
fields: An object containing the fields that all input messages should have. This is vital since filters typically are configured based on the event's type and/or tags. Scalar values (strings, numbers, and booleans) are supported, as are arrays of scalars. It seems Logstash doesn't support nested arrays.
ignore: An array with the names of the fields that should be removed from the events that Logstash emit. This is for example useful for dynamically generated fields whose contents can't be predicted and hardwired into the test case file.
input: An array with the lines of input (each line being a string) that should be fed to the Logstash process.

Known limitations and future work

Some log formats don't include all timestamp components. For example, most syslog formats don't include the year. This should be dealt with somehow.
JSON files are tedious to write for a human with brackets, braces, double quotes, and escaped double quotes everywhere and no native support for comments. We should support YAML in addition to JSON to make it more pleasant to write test case files.
All Logstash processes are run serially. By running them in parallel the execution time can be reduced drastically on multi-core machines.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
debian		debian
logging		logging
logstash		logstash
testcase		testcase
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
logstash-filter-verifier.go		logstash-filter-verifier.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debian

debian

logging

logging

logstash

logstash

testcase

testcase

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md