Skip to content

terminiter/earthquake

 
 

Repository files navigation

Earthquake: Programmable Fuzzy Scheduler for Testing Distributed Systems

Release Join the chat at https://gitter.im/osrg/earthquake GoDoc Build Status

Earthquake is a programmable fuzzy scheduler for testing real implementations of distributed system (such as ZooKeeper).

Blog: http://osrg.github.io/earthquake/

Earthquakes permutes C/Java function calls, Ethernet packets, Filesystem events, and injected faults in various orders so as to find implementation-level bugs of the distributed system. Earthquake can also control non-determinism of the thread interleaving (by calling sched_setattr(2) with randomized parameters). So Earthquake can be also used for testing standalone multi-threaded software.

Basically, Earthquake permutes events in a random order, but you can write your own state exploration policy (in Golang) for finding deep bugs efficiently.

Found/Reproduced Bugs

Quick Start

The following instruction shows how you can start Earthquake Container, the simplified CLI for Earthquake.

$ sudo apt-get install libzmq3-dev libnetfilter-queue-dev
$ go get github.com/osrg/earthquake/earthquake-container
$ sudo earthquake-container run -it --rm ubuntu bash

In Earthquake Container, you can run arbitrary command that might be flaky. JUnit tests are interesting to try.

earthquake-container$ git clone something
earthquake-container$ cd something
earthquake-container$ for f in $(seq 1 1000);do mvn test; done

You can also specify a config file (-eq-config option for earthquake-container.) A typical configuration file (config.toml) is as follows:

# Policy for observing events and yielding actions
# You can also implement your own policy.
# Default: "random"
explorePolicy = "random"

[explorePolicyParam]
  # for Ethernet/Filesystem/Java inspectors, event are non-deterministically delayed.
  # minInterval and maxInterval are bounds for the non-deterministic delays
  # Default: 0 and 0
  minInterval = "80ms"
  maxInterval = "3000ms"

[containerParam]
  # Default: false
  enableEthernetInspector = true
  # Default: true
  enableProcInspector = true
  # Default: "1s"
  procWatchInterval = "1s"

If you don't want to use containers, you can also use Earthquake with an arbitrary process tree.

$ go get github.com/osrg/earthquake/earthquake
$ sudo earthquake inspectors proc -root-pid $TARGET_PID -watch-interval 1s -autopilot config.toml

For full-stack (fully-distributed) Earthquake environment, please refer to doc/how-to-setup-env-full.md.)

The slides for the presentation at FOSDEM might be also helpful.

Talks

How to Contribute

We welcome your contribution to Earthquake. Please feel free to send your pull requests on github!

Copyright

Copyright (C) 2015 Nippon Telegraph and Telephone Corporation.

Released under Apache License 2.0.


API Overview

// implements earthquake/explorepolicy/ExplorePolicy interface
type MyPolicy struct {
	actionCh chan Action
}

func (p *MyPolicy) GetNextActionChan() chan Action {
	return p.actionCh
}

func (p *MyPolicy) QueueNextEvent(event Event) {
	// Possible events:
	//  - JavaFunctionEvent (byteman)
	//  - PacketEvent (Netfilter, Openflow)
	//  - FilesystemEvent (FUSE)
	//  - ProcSetEvent (Linux procfs)
	//  - LogEvent (syslog)
	fmt.Printf("Event: %s\n", event)
	// You can also inject fault actions
	//  - PacketFaultAction
	//  - FilesystemFaultAction
	//  - ProcSetSchedAction
	//  - ShellAction
	action, err := event.DefaultAction()
	if err != nil {
		panic(err)
	}
	// send in a goroutine so as to make the function non-blocking.
	// (Note that earthquake/util/queue/TimeBoundedQueue provides
	// better semantics and determinism, this is just an example.)
	go func() {
		fmt.Printf("Action ready: %s\n", action)
		p.actionCh <- action
		fmt.Printf("Action passed: %s\n", action)
	}()
}

func NewMyPolicy() ExplorePolicy {
	return &MyPolicy{actionCh: make(chan Action)}
}

func main(){
	RegisterPolicy("mypolicy", NewMyPolicy)
	os.Exit(CLIMain(os.Args))
}

Please refer to example/template for further information.

Known Limitation

After running Earthquake (process inspector) many times, sched_setattr(2) can fail with EBUSY. This seems to be a bug of kernel; We're looking into this.

About

🌏 A programmable fuzzy scheduler for testing distributed systems

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 56.1%
  • Java 11.9%
  • Python 10.3%
  • Shell 8.4%
  • C++ 5.9%
  • HTML 3.1%
  • Other 4.3%