Skip to content

jonpugh/maggiefs

 
 

Repository files navigation

maggiefs -- A fully posix-compliant distributed filesystem

Design

Architecturally, MaggieFS is very similar to the Hadoop Distributed Filesystem (HDFS) with a model of one namenode and N datanodes. It adds a third component, the leaseserver, which is co-located with the nameserver and responsible for establishing happens-before relationships so that we can guarantee consistency across the cluster when files are changed.

The filesystem implements client functions using the Fuse low-level API, which provides a tree model to the kernel and the kernel walks that tree to resolve paths, reading/writing data from files by inode id. In terms of performance, the kernel routines for managing readahead and VFS cache should benefit MaggieFS and likely make it more performant than Hadoop for random reads while equivalent for streaming reads, once MaggieFS itself is free of glaring performance holes.

The goal of this filesystem is to provide posix semantics and full read-write capability, to be faster than HDFS in some situations, and to take up fewer system resources.

To Install

The mfs binary is responsible for running the nameserver and dataserver processes, as well as the client.

To install, first set up your $GOPATH according to standard go project conventions.

mfs has a dependency on the leveldb library, version 1.9.0. If you're using ubuntu 13.04, it should be available using:

apt-get libleveldb-dev

If your package manager doesn't have a recent enough version of levelDB, follow the mfs levelDB installation instructions to set up go to build against a downloaded version of the library.

Finally, run:

go get github.com/jbooth/maggiefs/mfs  
go install github.com/jbooth/maggiefs/mfs 

And you'll have the mfs binary in $GOPATH/bin.

To Run

The mfs binary has 4 operation modes (and a couple utilities).

mfs singlenode [numDatanodes] [volumesPerDN] [replicationFactor] [baseDir for data] [mountPoint]

mfs singlenode runs a mock cluster by building out directories under a temp directory. It's useful for testing or test-driving. If you wanted to run a mock cluster with 3 DNs, 1 volume each and a replicationFactor of 2, you could run:

mfs singlenode 3 1 2 /tmp/maggiefsData /tmp/maggiefsMount

mfs nameserver [configPath]

Runs nameserver

mfs dataserver [configPath] [localMountPoint]

For dataservers, we tend to run a client and mount somewhere on that machine as part of the same process, to facilitate certain optimizations for local data.

mfs client [nameHost] [leaseHost] [localMountPoint]

To run standalone client to existing cluster

About

distributed read/write filesystem in go, bound to local mountpoint using go-fuse

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 96.6%
  • Java 3.2%
  • Shell 0.2%