Skip to content

wilseypa/rphash-golang

Repository files navigation

RPHash

Build Status Release Status RPHash

RPHash takes clustering and unsupervised learning problems and solves them in an embarrassingly parallel manner.

Clustering is a core concept in data analysis. Issues arise with scalability and dimensionality, ever changing environments and compatibility, insecure communications and data movement.

The solution is secure, reliable, and fast data for large-scale distributed systems.

Random Projection Hash (RPHash)

The algorithm was created for maximizing parallel computation while providing scalability for large scale deployment. It's suitable for high dimensional data sets and is scalable and streamline.

Overview

Installing

Ensure you have Go, git, and mercurial installed on your system. Additionally, ensure that you have your Go environment setup.

go get github.com/wilseypa/rphash-golang
# or, clone from source
git clone https://github.com/wilseypa/rphash-golang.git

API

rphash-golang                         # Streaming command for clustering
  --num.clusters <#>                  # Number of clusters -> output centroids
  --num.shards <#>                    # Number of shards on the data
  --local.file <filename>             # Filename to cluster
  --cluster <rphash|streaming-kmeans> # Cluster algorithm
  --centroid.plots                    # Enable plots
  --centroid.plots.file <filename>    # Output dimension plot path
  --centroid.paint <filename>         # Output of a NxN matrix (experimental)
  --centroid.heat <filename>          # Output of a 3D heatmap (experimental)
  --hdfs.enable                       # Enable hdfs
  --hdfs.dir                          # hdfs directory
  [glow flags]                        # All other glow flags

Test

go test ./tests -v -bench=.

Developers

  • Sam Wenke (wenkesj)
  • Jacob Franklin (frankljbe)

Documentation

  • Sadiq Quasem (quasemsm)

About

Scalable Big Data Clustering by Random Projection Hashing: golang implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages