Skip to content

quasemsm/rphash-golang

 
 

Repository files navigation

Scalable Big Data Clustering by Random Projection Hashing

Build Status

Table of contents

Installing and Testing

git clone https://github.com/wenkesj/rphash

or

go get github.com/wenkesj/rphash
sh rphash/install

Test

cd rphash/tests
go test -v -bench=.

API

The official documentation for the high performance big data clustering algorithm RPHash.

type RPHashObject

An instance of the RPHashObject is the SimpleArray struct.

import "github.com/wenkesj/rphash/reader/simplearray"
type SimpleArray struct {
  data types.Iterator;
  dimension int;
  numberOfProjections int;
  decoderMultiplier int;
  randomSeed int64;
  hashModulus int64;
  k int;
  numberOfBlurs int;
  decoder types.Decoder;
  centroids [][]float64;
  topIDs []int64;
};

func NewSimpleArray

func NewSimpleArray(X [][]float64, k int) *SimpleArray

Returns a new RPHashObject.

type Simple

import "github.com/wenkesj/rphash/simple"
type Simple struct {
  centroids [][]float64
  variance float64
  rphashObject RPHashObject
}

func NewSimple

func NewSimple(_rphashObject RPHashObject) *Simple

NewSimple returns an instance of the Simple struct.

func (*Simple) Map

func (this *Simple) Map() RPHashObject

Maps all the default tasks to the RPHashObject. This will update and return the new RPHashObject.

func (*Simple) Reduce

func (this *Simple) Reduce() RPHashObject

Performs all the default tasks on the RPHashObject. Updates and returns new RPHashObject.

func (*Simple) GetCentroids

func (this *Simple) GetCentroids() [][]float64

Performs a KMeans operation on the Simple's centroids with the RPHashObject K value. Returns the calculated centroids.

func (*Simple) Run

func (this *Simple) Run()

Performs the Map and Reduce functions and updates the centroids.

func (*Simple) GetParam

func (this *Simple) GetParam() RPHashObject

Returns the RPHashObject of the Simple struct.

type Stream

import "github.com/wenkesj/rphash/stream"
type Stream struct {
  counts []int64;
  centroids [][]float64;
  variance float64;
  centroidCounter types.CentroidItemSet;
  random *rand.Rand;
  rphashObject types.RPHashObject;
  lshGroup []types.LSH;
  decoder types.Decoder;
  projector types.Projector;
  hash types.Hash;
  varTracker types.StatTest;
};

func NewStream

func NewStream(_rphashObject types.RPHashObject) *Stream

func (*Stream) AddVectorOnlineStep

func (this *Stream) AddVectorOnlineStep(vec []float64) int64

func (*Stream) GetCentroids

func (this *Stream) GetCentroids() [][]float64

func (*Stream) GetCentroidsOfflineStep

func (this *Stream) GetCentroidsOfflineStep() [][]float64

func (*Stream) Run

func (this *Stream) Run()

About

Scalable Big Data Clustering by Random Projection Hashing: golang implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 98.8%
  • Shell 1.2%