Rabbit Tree
Radix bit tries for implementing associative arrays and sets in C.
|
Rabbit trees (radix bit tries) are binary radix tries that use bit arrays as keys.
A visual representation should help to clarify the concept before proceeding. The following is a rabbit tree that associates string constants with 32-bit keys:
00000 ├0 │├0 ││├0000000000000000000000000 zero ││└1000000000000000000000000 one │└1 │ ├0000000000000000000000000 two │ └1000000000000000000000000 three └1 ├0 │├0000000000000000000000000 four │└1000000000000000000000000 five └1 ├0000000000000000000000000 six └1000000000000000000000000 seven
Each line represents a node. The leading "0"s and "1"s represent the bits in each node's key fragment. Box-drawing characters show the tree hierarchy, with the first line representing the root node. Strings at the ends of lines represent values held by nodes. The key associated with a node is the sequence of key fragments that start at the root node and end at the target node.
For example, the key associated with the node containing the string "four" is 00000
+ 1
+ 0
+ 0000000000000000000000000
= 00000100000000000000000000000000
. Incidentally, this corresponds to a 32-bit int
with the value 4, because the tree maps integer numbers to their English string representations.
Lookups proceed by matching fragments of a query key against the key fragment of a node. When all of the bits match, the next bit of the query key determines which child will be used to continue the query.
For example, given the key 00000110000000000000000000000000
and the previous tree, the following will occur, starting at the root node:
00000
+ 110000000000000000000000000
. The next bit in the query key is 1
so the right child is chosen (the lower child in the above representation).1
+ 10000000000000000000000000
. The bit after that is also 1
, so go right again.1
+ 0000000000000000000000000
. The bit after that is 0
, so go left.0000000000000000000000000
This is the associated node.If a pointer to the underlying contiguous bits can be acquired for a given data type then objects of that type can be used as keys. This means that rabbit trees can be used with almost any data type, such as int
s, float
s, struct
s and even strings. The key length does not need to be fixed either.
A rabbit tree requires at most n-1 placeholder nodes to hold n values and the depth of a tree of key size k cannot exceed k. Lookups, insertions and deletions are done O(k) time in the worst case scenario, which only occurs when the tree is full.
Rabbit tree keys are implemented as arrays of unsigned integer types. The type of integer used to implement the array is referred to as the "pin" type due to the analogy with a pin tumbler lock.
Wrapper functions are provided to enable transparent conversion between user key types (e.g. ints, structs, strings) and the internal key type represented as an array of pins.
To provide the greatest flexibility, rabbit trees rely on macros and use several header files to enable code re-use. It is therefore possible to use nodes holding a given value type with different key types without redundant definitions of the node functions.
Each header will require certain macros to be defined before it is included. These are detailed in the documentation for each header.