Skip to content

kleopatra999/badwolf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BadWolf

BadWolf was born as a loosely modeled graph store after RDF. However, triples were expanded to quads to allow simpler temporal reasoning. Most of the web related parts of RDF were never used. Instead time reasoning become the main reason for its existence. This project represents the evolution of the original BadWolf temporal graph store. Most of the original RDF structs have been removed, however BadWolf targets to retain its simplicity and flexibility.

In case you are curious about the name, BadWolf is named after the BadWolf entity as it appeared in Dr. Who series after Rose Tyler looked into the Time Vortex itself. The BadWolf entity scattered events in time as self encode messages, creating a looped ontological paradox. Hence, naming a temporal graph store after the entity seemed appropriate.

Basic data abstractions

This section describes the three basic data abstractions that BadWolf provides. It is important to keep in mind, that all data described using BadWolf abstractions is immutable. In other words, once you have created one of them, there is no way to mutated its value, however, you will always be able to create new entities as needed.

Node

Nodes represents unique entities in a graph. Entities are represented by two elements: (1) the ID that identifies it entity, and (2) the type of the entity. You may argue that collapsing both in a single element would achieve similar goals. However, there are benefits to keep explicit type information (e.g. indexing, filtering, etc.).

Node Type

BadWolf does not provide type ontology. Types are left to the owner of the data to express. Having that said, BadWolf requires type assertions to be expressed using hierarchies express as paths separated by forward slashes. Example of possible types are:

   /organization
   /organization/country
   /organization/company

Types follow a simple file path syntax. Only two operations are allowed on types.

  • Equality: Given two types A and B, A == B if and only if they have the exact same path representation. In other words, if strings(A)==string(B) wher == is the case sensitive equal.
  • Covariant: Given two types A and B, A covariant B if B is a A. In other word, A covariant B if B is a prefix of A.

Node ID

BadWolf does not make any assumption about ID structure. IDs are represented as UTF8 strings. No spaces, tabs, LF or CR are allowed as part of the ID to provide efficient node marshaling and unmarshaling. The only restriction for node IDs is that they cannot contain for efficient marshaling reasons neither '<' nor '>'.

Marshaled representation of a node

Nodes can be marshaled and unmarshaled from a simple text representation. This representation follows this simple structure type<id> for efficient processing. Some example of marshaled into text nodes are listed below.

   /organization/country<United States of America>
   /organization/company<Google>

Node equality

Two nodes are equal if their ID and type are equal.

Literals

Literals are data containers. BadWolf has only a few primitive types that are allowed to be boxed in a literal. These types are:

  • Bool indicates that the type contained in the literal is a bool.
  • Int64 indicates that the type contained in the literal is an int64.
  • Float64 indicates that the type contained in the literal is a float64.
  • String indicates that the type contained in the literal is a string.
  • Blob indicates that the type contained in the literal is a []byte.

It is important to note that a container contains one value, and one value only. Also, as mentioned earlier, all values and, hence, literals are immutable. String and Blob can contain elements of arbitrary length. This can be problematic depending on the storage backend being used. For that reason, the literal package provides mechanisms to enforce maximum lenght limits to protect storage backends.

Two literal builders are provided to create new literals:

  • DefaultBuilder allows building valid literals of unbounded size.
  • NewBoundeBuilder allows building valid literals of a bounded specified size.

Literals can be pretty printed into a string format. The pretty printing retains the type and value of the literal. The format of the pretty printing formed by the string representation of the value between quotes followed by ^^ and the type assertion type: with the corresponding type appended. This pretty printing convention loosely follows the RDF specification for literals also simplifying the parsing of such string formated literals. Some examples of pretty printed literals are shown below.

  "true"^^type:bool
  "false"^^type:bool
  "-1"^^type:int64
  "0"^^type:int64,
  "1"^^type:int64
  "-1"^^type:float64
  "0"^^type:float64
  "1"^^type:float64
  ""^^type:text
  "some random string"^^type:text`
  "[]"^^type:blob
  "[115 111 109 101 32 114 97 110 100 111 109 32 98 121 116 101 115]"^^type:blob

The above representation can also be used to create a literal.

Predicates

Predicates allow predicating properties of nodes. BadWolf provide two different kind of predicates:

  • Immutable or predicates that are always valid regardless of when they were created. For instance, they are useful to describe properties that never change, for instance, the color of someone's eyes.
  • Temporal predicates that are anchored at some point along the time continuum. For instance, the predicate met describing when two nodes met is anchored at a particular time.

It is important to note here that temporal predicates are descriptive of a property in relation to time. The granularity (or window) of validity of that predicate is left to the temporal reasoning module. This is important, since allow to reason against arbitrary time granularities. All time calculations and reasoning in BadWold assume a Gregorian calendar.

Predicate ID

Similar to the node IDs, predicate IDs in BadWolf do not make any assumption about ID structure. IDs are represented as UTF8 strings. No spaces, tabs, LF or CR are allowed as part of the ID to provide efficient node marshaling and unmarshaling.

Time anchors

When parsing or printing dates into time anchors for temporal predicates, Badwolf follows the RFC3339 variant RFC3339Nano as specified in the GO programming language to provide reliable granularity to express anchors in nanoseconds. An example of time anchor express in RFC3339Nano format is shown below.

   2006-01-02T15:04:05.999999999Z07:00

So for instance the fully pretty printed predicate for an immutable and
a temporal triple are shown below.

   "colow_of_eyes"@[]
   "met"@[2006-01-02T15:04:05.999999999Z07:00]

Triple

The basic unit of storage on BadWolf is the triple. A triple is a three tuple <s p o> defined as follows:

  • s, or subject, is a BadWolf node.
  • p, or predicate, is a BadWolf predicate.
  • o, or object, is either a BadWolf node, predicate, or literal.

Triples can be marshaled and unmarshaled. The string representation of a triple it is just the string representation of each of its components separated by blank separator (tab is the prefered blank separator).

Blank nodes and triple reification

A blank node is a node of type /_ where the id is unique in BadWolf. Blank nodes can requested to be created by BadWolf. The main use of blank nodes is to allow triple reification, or predicate properies about facts. It is important to keep in mind that predication properties about a node can be achieved by a triple, however predicating properties about a fact (triple) require reification. This is better explained with an example.

Let's assume we have the following fact:

  /user<John> "met"@[2006-01-02T15:04:05.999999999Z07:00] /user<Mary>

This represents the fact that John met Mary back in 2006. They both met in New York. This fact represents a property (location New York) of the original fact (John met Mary). To achieve this and maintain a uniform data representation you need a way to express such information into triples.

Reification is the process of predicating properties by adding new triples. This is achieved by creating a new blank node and using three special internal predicates _subject, _predicate, _object. Reifying the above triple would add the following triples.

  /user<John> "met"@[2006-01-02T15:04:05.999999999Z07:00] /user<Mary>
  /_<BUID> "_subject"@[2006-01-02T15:04:05.999999999Z07:00] /user<John>
  /_<BUID> "_predicate"@[2006-01-02T15:04:05.999999999Z07:00] "met"@[2006-01-02T15:04:05.999999999Z07:00]
  /_<BUID> "_object"@[2006-01-02T15:04:05.999999999Z07:00] /user<Mary>

Reifying temporal triples anchors all the derived temporal triples at the same time anchor of the original triple. Now, you can predicate any property about the fact by predicating against the blank node. Hence we can now predicate about where John and Mery met as shown below.

  /user<John> "met"@[2006-01-02T15:04:05.999999999Z07:00] /user<Mary>
  /_<BUID> "_subject"@[2006-01-02T15:04:05.999999999Z07:00] /user<John>
  /_<BUID> "_predicate"@[2006-01-02T15:04:05.999999999Z07:00] "met"@[2006-01-02T15:04:05.999999999Z07:00]
  /_<BUID> "_object"@[2006-01-02T15:04:05.999999999Z07:00] /user<Mary>
  /_<BUID> "location"@[2006-01-02T15:04:05.999999999Z07:00] /city<New York>

Anchoring the time predicate on the same time ancor as the reified triples seem appropriate for this example, but there are no restrictions of what you predicate against blank nodes.

Storage abstraction layer

BadWolf does not provide any storage. Instead provide a low level API for data persistance. This allow to provide different storage implementations (also known sometimes as drivers) but still maintain the same data abstractions and data manipulation. This property allows you to use your favorite backend, for data storage, or just implement a new one for your next project.

BadWolf release comes along ony with a simple volatile, RAM-based implementation of the storage abstraction layer to illustrate how the API can be implemented.

The storage abstraction layer is built around two simple interfaces:

  • storage.Store interface: Allows to create new named graphs.
  • storage.Graph interface: Provides low level API to manipulate and lookup the data stored in the graph. It is important not to to confuse the data lookup capabilities with the BadWolf query language.

The goal of these interfaces is to allow writing specialized drivers for different storage backends. For instance, BadWolf provides a simple memory based implementation of this interfaces in the storage/memory package.

About

Temporal graph store abstraction layer.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 100.0%