Rabbit Tree
Radix bit tries for implementing associative arrays and sets in C.
Rabbit Tree

Introduction

Overview

Rabbit trees (radix bit tries) are binary radix tries that use bit arrays as keys.

A visual representation should help to clarify the concept before proceeding. The following is a rabbit tree that associates string constants with 32-bit keys:

00000
    ├0
    │├0
    ││├0000000000000000000000000 zero
    ││└1000000000000000000000000 one
    │└1
    │ ├0000000000000000000000000 two
    │ └1000000000000000000000000 three
    └1
     ├0
     │├0000000000000000000000000 four
     │└1000000000000000000000000 five
     └1
      ├0000000000000000000000000 six
      └1000000000000000000000000 seven

Each line represents a node. The leading "0"s and "1"s represent the bits in each node's key fragment. Box-drawing characters show the tree hierarchy, with the first line representing the root node. Strings at the ends of lines represent values held by nodes. The key associated with a node is the sequence of key fragments that start at the root node and end at the target node.

For example, the key associated with the node containing the string "four" is 00000 + 1 + 0 + 0000000000000000000000000 = 00000100000000000000000000000000. Incidentally, this corresponds to a 32-bit int with the value 4, because the tree maps integer numbers to their English string representations.

Lookups proceed by matching fragments of a query key against the key fragment of a node. When all of the bits match, the next bit of the query key determines which child will be used to continue the query.

For example, given the key 00000110000000000000000000000000 and the previous tree, the following will occur, starting at the root node:

  • The root node's key fragment matches the first 5 bits of the query key: 00000 + 110000000000000000000000000. The next bit in the query key is 1 so the right child is chosen (the lower child in the above representation).
  • The node's single bit matches the next bit of the query key: 1 + 10000000000000000000000000. The bit after that is also 1, so go right again.
  • The node's single bit matches the next bit of the query key: 1 + 0000000000000000000000000. The bit after that is 0, so go left.
  • The node's bits match the remaining query key bits: 0000000000000000000000000 This is the associated node.

Motivation

If a pointer to the underlying contiguous bits can be acquired for a given data type then objects of that type can be used as keys. This means that rabbit trees can be used with almost any data type, such as ints, floats, structs and even strings. The key length does not need to be fixed either.

A rabbit tree requires at most n-1 placeholder nodes to hold n values and the depth of a tree of key size k cannot exceed k. Lookups, insertions and deletions are done O(k) time in the worst case scenario, which only occurs when the tree is full.

Implementation

Concepts

Rabbit tree keys are implemented as arrays of unsigned integer types. The type of integer used to implement the array is referred to as the "pin" type due to the analogy with a pin tumbler lock.

Wrapper functions are provided to enable transparent conversion between user key types (e.g. ints, structs, strings) and the internal key type represented as an array of pins.

Code

To provide the greatest flexibility, rabbit trees rely on macros and use several header files to enable code re-use. It is therefore possible to use nodes holding a given value type with different key types without redundant definitions of the node functions.

Each header will require certain macros to be defined before it is included. These are detailed in the documentation for each header.

Contact
echo xyne.archlinux.org | sed 's/\./@/'