This package provides Streaming Approximate Histograms for efficient quantile approximations.
The histograms in this package are based on the algorithms found in Ben-Haim & Yom-Tov's A Streaming Parallel Decision Tree Algorithm (PDF). Histogram bins do not have a preset size. As values stream into the histogram, bins are dynamically added and merged.
Another implementation can be found in the Apache Hive project (see NumericHistogram).
An example:
The accurate method of calculating quantiles (like percentiles) requires data to be sorted. Streaming histograms make it possible to approximate quantiles without sorting (or even individually storing) values.
NumericHistogram is the more basic implementation of a streaming histogram. WeightedHistogram implements bin values as exponentially-weighted moving averages.
A maximum bin size is passed as an argument to the constructor methods. A larger bin size yields more accurate approximations at the cost of increased memory utilization and performance.
A picture of kittens:
$ go get github.com/VividCortex/gohistogram
import "github.com/VividCortex/gohistogram"
Get the code into your workspace:
$ cd $GOPATH
$ git clone git@github.com:VividCortex/gohistogram.git ./src/github.com/VividCortex/gohistogram
You can run the tests now:
$ cd src/github.com/VividCortex/gohistogram
$ go test .
Full source documentation can be found here.
Copyright (c) 2013 VividCortex
Released under MIT License. Check LICENSE
file for details.