HM 9000 is a rewrite of CloudFoundry's Health Manager. HM 9000 is written in Golang and has a more modular architecture compard to the original ruby implementation.
As a result there are several Go Packages in this repository, each with a comprehensive set of unit tests. What follows is a detailed breakdown:
cloudfoundry/hm9000 will eventually be promoted and move to cloudfoundry/health_manager. This is the temporary home while it is under development.
hm9000 is not yet a complete replacement for health_manager -- we'll update this README when it's ready for primetime.
Assuming you have go
v1.1.* installed:
-
Setup a go workspace:
$ cd $HOME $ mkdir gospace $ export GOPATH=$HOME/gospace $ export PATH=$HOME/gospace/bin:$PATH $ cd gospace
-
Fetch HM9000 and its dependencies (not locked down for now):
$ go get -v github.com/cloudfoundry/hm9000 #you will need mercurial and git installed
-
Install
etcd
$ pushd ./src/github.com/coreos/etcd $ ./build $ mv etcd $GOPATH/bin/ $ popd
-
Start
etcd
. Open a new terminal session and:$ export PATH=$HOME/gospace/bin:$PATH $ cd $HOME $ mkdir etcdstorage $ cd etcdstorage $ etcd
etcd
generates a number of files in CWD when run locally, henceetcdstorage
-
Running
hm9000
. Back in the terminal you used togo get
hm9000 you should be able to$ hm9000
and get usage information
-
Running the tests
$ go get github.com/onsi/ginkgo/ginkgo $ cd src/github.com/cloudfoundry/hm9000/ $ ginkgo -r -skipMeasurements -race -failOnPending
These tests will spin up their own instances of
etcd
as needed. It shouldn't interfere with your long-runningetcd
server. -
Updating hm9000. You'll need to fetch the latest code and recompile the hm9000 binary:
$ cd $GOPATH/src/github.com/cloudfoundry/hm9000 $ git checkout master $ git pull $ go install .
hm9000
requires a config file. To get started:
$ cd $GOPATH/src/github.com/cloudfoundry/hm9000
$ cp ./config/default_config.json ./local_config.json
$ vim ./local_config.json
You must specify a config file for all the hm9000
commands. You do this with (e.g.) --config=./local_config.json
hm9000 fetch_desired --config=./local_config.json
will connect to CC, fetch the desired state, put it in the store under /desired
, then exit.
hm9000 listen --config=./local_config.json
will come up, listen to NATS for heartbeats, and put them in the store under /actual
, then exit.
etcd
has a very simple curlable API. For convenience:
hm9000 dump --config=./local_config.json
will dump the entire contents of the store to stdout.
The top level is home to the hm9000
CLI. The hm
package houses the CLI logic to keep the root directory cleaner. The hm
package is where the other components are instantiated, fed their dependencies, and executed.
The actualstatelistener
provides a simple listener daemon that monitors the NATS
stream for app heartbeats. It generates an entry in the store
for each heartbeating app under /actual/INSTANCE_GUID
.
It also maintains a FreshnessTimestamp
under /actual-fresh
to allow other components to know whether or not they can trust the information under /actual
Relevant config:
heartbeat_ttl_in_seconds
: the TTL to set on the/actual/INSTANCE_GUID
keysactual_freshness_ttl_in_seconds
: the TTL for the freshness key. Typically this should be the same asconfig.HeartbeatTTL
(if we don't hear back from NATS at all byconfig.HeartbeatTTL
then we should probably question the reliability of our NATS connection)actual_freshness_key
: the name of the freshness key to put in the store. The default/actual-fresh
should be adequate.nats.host
,nats.port
,nats.user
,nats.password
: the various bits and pieces needed to connect to NATS
The desiredstatefetcher
requests the desired state from the cloud controller. It transparently manages fetching the authentication information over NATS and making batched http requests to the bulk api endpoint.
Desired state is stored under `/desired/APP_GUID-APP_VERSION
Relevant config:
desired_state_ttl_in_seconds
: the TTL to set on the/desired/APP_GUID-APP_VERSION
keysdesired_freshness_ttl_in_seconds
: the TTL for the freshness key.desired_freshness_key
: the name of the freshness key to put in the store. The default/desired-fresh
should be adequate.CCAuthMessageBusSubject
: the message bus subject to use to fetch theCC
authentication credentials. Defaults tocloudcontroller.bulk.credentials.default
CCBaseURL
: the base path for the CC API endpoint`nats.host
,nats.port
,nats.user
,nats.password
: the various bits and pieces needed to connect to NATS
WIP: The analyzer
runs periodically and uses the outbox
to put start
and stop
messages in the queue.
WIP: The sender
runs periodically and uses the outbox
to pull messages off the queue and send them over NATS
. The sender
is responsible for throttling the rate at which messages are sent over NATS.
WIP: The api
is a simple HTTP server that provides access to information about the actual state. It uses the high availability store to fulfill these requests.
config
parses the config.json
configuration. Components are typically given an instance of config
by the hm
CLI.
helpers
contains a number of support utilities.
The freshnessmanager
manages writing and reading/interpreting freshness keys in the store.
A trivial wrapper around net/http
that improves testability of http requests.
Provides a (sys)logger. Eventually this will use steno to perform logging.
Provides a TimeProvider
. Useful for injecting time dependencies in tests.
Provides a worker pool with a configurable pool size. Work scheduled on the pool will run concurrently, but no more poolSize
workers can be running at any given moment.
WIP: outboux
is a library used by the analyzer and scheduler to add/remove messages from the message queue. The message queue is stored in the high availability store
. The outbox
guarantees that a message is only added once to the queue (i.e. if a start
message for a given app guid, version guid, and index has not been sent yet, the outbox
will prevent another identical start
message from being inserted in the queue).
models
encapsulates the various JSON structs that are sent/received over NATS/HTTP. Simple serializing/deserializing behavior is attached to these structs.
The store
is an generalized client for connecting to a Zookeeper/ETCD-like high availability store. Components that must read or write to the high availability store use this library. Writes are performed concurrently for optimal performance.
testhelpers
contains a (large) number of test support packages. These range from simple fakes to comprehensive libraries used for faking out other CloudFoundry components (e.g. heartbeating DEAs) in integration tests.
Provides a fake implementation of the helpers/freshnessmanager
interface
Provides a fake implementation of the helpers/logger
interface
Provides a fake implementation of the helpers/timeprovider
interface. Useful for injecting time dependency in test.
Provdes a fake implementation of the helpers/httpclient
interface that allows tests to have fine-grained control over the http request/response lifecycle.
app
is a simple domain object that encapsulates a running CloudFoundry app.
The app
package can be used to generate self-consistent data structures (heartbeats, desired state). These data structures are then passed into the other test helpers to simulate a CloudFoundry eco-system.
Think of app
as your source of fixture test data. It's intended to be used in integration tests and unit tests.
Some brief documentation -- look at the code and tests for more:
//get a new fixture app, this will generate appropriate
//random APP and VERSION GUIDs
app := NewApp()
//Get the desired state for the app. This can be passed into
//the desired state server to simulate the APP's precense in
//the CC's DB. By default the app is staged and started, to change
//this, modify the return value. Time is always injected.
desiredState := app.DesiredState(UPDATED_AT_TIMESTAMP)
//get an instance at index 0. this getter will lazily create and memoize
//instances and populate them with an INSTANCE_GUID and the correct
//INDEX.
instance0 := app.GetInstance(0)
//fetch, for example, the exit message for the instance. Time is always injected
exitedMessage := instance0.DropletExited(DropletExitReasonCrashed, TIMESTAMP)
//generate a heartbeat for the app. first argument is the # of instances,
// second is a timestamp denoting *when the app transitioned into its current state*
//note that the INSTANCE_GUID associated with the instance at index 0 will
//match that provided by app.GetInstance(0)
app.Heartbeat(2, TIMESTAMP)
Provides a simple mechanism to publish actual state related messages to the NATS bus. Handles JSON encoding.
Listens on the NATS bus for health.start
and health.stop
messages. It parses these messages and makes them available via a simple interface. Useful for testing that messages are sent by the health manager appropriately.
Brings up an in-process http server that mimics the CC's bulk endpoints (including authentication via NATS and pagination).
Brings up and manages the lifecycle of a live NATS server. After bringing the server up it provides a fully configured cfmessagebus object that you can pass to your test subjects.
Brings up and manages the lifecycle of a live ETCD server cluster.
The MCAT is comprised of two major integration test suites:
The MD
test suite excercises the HM9000
components through a series of integration-level tests. The individual components are designed to be simple and have comprehensive unit test coverage. However, it's crucial that we have comprehensive test coverage for the interactions between these components. That's what the MD
suite is for.
The PHD
suite is a collection of benchmark tests. This is a slow-running suite that is intended, primarily, to evaluate the performance of the various components (especially the high-availability store) under various loads.