Skip to content

alecthomas/vheap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Virtual Heap

This Go package provides fast, mmapped access to a virtual heap. Each block allocated from the heap is assigned a unique, persistent reference ID that can be used to address the block across application runs.

The implementation is currently very basic, but sufficient for its intended purpose.

Use-cases

Why would you want to use this?

It is useful for situations where you want very low overhead object creation and tight control over the layout of and access to, data.

Its initial use will be in a search index, storing posting lists, document names, scoring information, and so on.

Example

Creating a heap and adding some data:

// Create a heap with an initial size of 32MB.
h, err := vheap.OpenForUpdate("hello.vheap", 32)
defer h.Close()

// Allocate a 32 byte block.
b, err := h.Allocate(32)

// Print the block's persistent ID (the first block is guaranteed to have an ID of 0,
// which is useful for storing references to other blocks)
println(b.Id)

// Copy some text into the block
copy(b.Bytes, []byte("hello world"))

Open the heap and retrieve the created block:

h, err := vheap.Open("hello.vheap")
defer h.Close()
b := h.GetBlock(0)
println(string(b.Bytes))

Details

The storage consists of a single file partitioned into multiple regions. Each region is mmapped. When a region fills up, a new region is appended. Blocks are allocated from the first region with sufficient space.

Currently, deallocated blocks are not reclaimed. This will lead to heavy fragmentation if blocks are frequently freed.

A region size is specified at index creation, typically in the order of 32-128MB. Total index size is increased by the region size when an allocation does not fit in any existing region. Additionally, if the new allocation exceeds the region size, the new region will be expanded to fit the allocation.

On-disk Structure

The structure at the start of each region is:

signature [8]byte     // Region signature
freeListPointer int64 // Pointer to start of free memory
regionSize int64      // Size, in bytes, of each region, including header and block list.
regionId int64        // Region ID, starting at 0.
data []byte           // Data

The block list is located at the end of each region, growing down:

id int64                             // Next free block ID in this region
entries []struct{offset, size int64} // Array of offset and raw size of allocated blocks.

When the top of the heap reaches the bottom of the index, a new region is allocated.

Concurrent Access

Allocations are (almost) atomic on 64-bit machines. Almost atomic in that free space is allocated first, then the list of block IDs is updated. If termination occurs in between these two operations, free space will be lost, though the heap will remain consistent.

The implication of this consistency is that a storage block can be read from concurrently while being written to by another process, though currently there is no support for automatically opening appended regions in readers.

About

Fast, persistent, mmapped, virtual heap.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages