CIS 5511. Programming Techniques

B-trees and Fibonacci Heaps

1. 2-3 tree and 2-3-4 tree

The idea behind 2-3 tree and some related data structures is to extend a binary tree into a "multi-way" tree. Instead of putting 1 key value and 2 branches into each node, we can put m keys and m+1 branches, and keep them "sorted" as in binary search trees.

A 2-3 tree is a tree where each node contains 1 or 2 key values (so has 2 or 3 children). Within the tree, a 2-node has 1 key and 2 children, and a 3-node has 2 keys and 3 children. All leaves are on the same level.

```                [|7|]
/   \
[|3|]   [|11|15|]
/  \     /  |  \
[1]  [5]  [9] [13] [17 19]
```
Searching a 2-3 tree is similar to searching a binary search tree, except that within each 3-node, in the worst case two comparisons are needed.

Insertion: In a 2-3 tree, a new key value is initially inserted into a leaf, according to the order of a search tree. Then there are the following possibilities:

1. If the leaf was a 2-node before the insertion, it becomes a 3-node.
2. If the leaf was a 3-node, it is split into two 2-node, and the middle value is inserted into the parent. This process may repeat at the parent. If the root splits, a new root is generated.

Deletion: To remove a key value from a 2-3 tree, the first step is to search for it. If it is in a leaf, simply remove it. If it is in an internal node, replace it by its predecessor (or successor), which must be in a leaf.

After the removal of the key value, if a leaf becomes empty, there are two possibilities:

1. If it has a 3-node sibling, a key is moved from the sibling to the parent, and the key in the parent is moved into the empty leaf.
2. If it has no 3-node sibling, it is merged with a sibling with key from the parent. This process may repeat at the parent. If the root become empty, it is removed.
In a 2-3 tree, all leaves are at the same level, and the complexity of major operations is O(log n).

A 2-3-4 tree extends a 2-3 tree by allowing 4-nodes, so each non-leaf node can have 2, 3, or 4 children.

The operations of 2-3-4 trees are similar to those of 2-3 trees.

There is a mapping between a 2-3-4 tree and a Red-Black tree. In the following, a lower case letter represents a key, and an upper case letter represents a subtree.

```               [|b|]                    b
/   \                  /   \
A     C                A     C

[|b|d|]                   b               d
/  |  \                 /   \           /   \
A   C   E               A     d    or   b     E
/ \       / \
C   E     A   C

[|b|d|f|]                   d
/  | |  \                 /   \
A   C E   G               b     f
/  \   /  \
A    C E    G
```
Since each node in a 2-3-4 true corresponds to one black node (plus at most two red nodes) in a Red-Black tree, the height of a 2-3-4 tree corresponds to the number of black nodes in the path from the root to a leaf in a Red-Black tree, which is half of the number of comparisons in the worst case.

2. B-Trees

B-trees are balanced search trees that are optimized for situations when part or all of the tree must be maintained in secondary storage such as a magnetic disk.

If you have a very large data base that cannot fit in main memory, you need to store it on disk and come up with some way to search it efficiently. Disk accesses are very time consuming compared to memory accesses. So you want to come up with a method that involves few disk accesses. If a binary tree is stored in secondary memory (with pointers to disk addresses) it will take too long to search it.

The height of a tree is what determines the number of comparisons and disk accesses we need to make. A binary tree has a minimum height of log(n) because it only stores 1 key in a node and has 2 branches. If we can come up with a technique for representing a tree node that stores more key and has more branches, we could store all the keys in a data structure that is not as deep and thus search it with fewer disk accesses.

A B-tree is a way of doing this. A B-tree with a minimum degree t (t is 2 or more) has the following properties:

1. If a non-leaf node contains n keys, it has n + 1 children.
2. For every internal node (that is neither the root or a leaf), its number of children is between t and 2t.
3. All leaf nodes are at the same level of the tree.
4. All keys stored in a node are in sequential order.
5. A node with n keys has the following structure:
{address of B-tree node for keys < key 1}
{key 1, address of data for key 1, address of B-tree node with keys > key 1 and < key 2}
{key 2, address of data for key 2, address of B-tree node with keys > key 2 and < key 3}
{key 3, address of data for key 3, address of B-tree node with keys > key 3 and < key 4}
. . .
{key n, address of data for key n, address of B-tree node with keys > key n}
Other versions of B-tree use "order" to indicate the maximum number of children of a node, or require the number of keys in each node (except the root) to be between d and 2d. Therefore 2-3 tree and 2-3-4 tree are both special types of B-tree.

Example: an order 5 B-tree (where each node contains 2-4 keys and 3-5 children)

```                  [|20|30|42|]
/   |  |   \
/--------/   /    \   \--------\
/            /      \            \
[ 10 15 ]  [ 25 28 ]  [ 32 34 ]  [ 44 55 ]
```
Search algorithm:
```  If pointer to node is null
target key is not in tree - return null
Else
Search current node
If target key is found
return disk address of item with that key
Else
If target key is < first key on node
Search first child of current node
Else if key k < target key < key k+1
Search child k
Else if target key is > last key on node
Search last child of current node
```
Insertion: Insert can be seen as a search (for the inserting position) followed by the actual insert of the entry. As in binary search trees, we always insert in a leaf node until the leaf node has m entries. Then we split it into two parts and send the middle entry up to the parent node. For the previous B tree, we can add 2 entries to each leaf node without having to split a leaf node.

New let's add the following entries: 8, 18, 26, 36, 39, 43

```                          [|20|30|42|]
/   |  |   \
/-------------/   /    \   \---------------\
/                 /      \                   \
[ 8 10 15 18 ]  [ 25 26 28 ]  [ 32 34 36 39 ]  [ 43 44 55 ]
```
At this point 2 of the leaf nodes are full and 2 are not. Let's insert in one of the full leaf nodes and see what happens.

Insert 37: 37 in not in the root so insert in the child node that contains keys between 31 and 41, inclusive. That node would contain: 32 34 36 37 39 which is too big, so split it into two nodes and pass the middle value (36) up.

```                              [|20|30|36|42|]
/   |  |  |   \
/-----------------/ /-/   |   \-\ \------------\
/                   /      |      \              \
[ 8 10 15 18 ]  [ 25 26 28 ]  [ 32 34 ]  [ 37 39 ]  [ 43 44 55 ]
```
Now let's insert in the other node which is full. Remember that all insertions begin at a leaf node. Insert 12 in leftmost leaf: 8 10 12 15 18 - too big, split and pass 12 up.

Parent node becomes 12 20 30 36 42 - too big, split in 2 and pass 30 up. New tree has 3 levels (it is fine if the root has less than m/2 entries). From this example, we see that B-Trees actually grow from the bottom up, rather than the top-down.

```                                  [|30|]
/  \
/    \
[|12|20|]  [|36|42|]
/---------------/  /   |    |   \  \--------------\
/          /-------/    |    |    \------\          \
/          /            /      \           \          \
[ 8 10 ]  [ 15 18 ]  [ 25 26 28 ]  [ 32 34 ]  [ 37 39 ]  [ 43 44 55 ]
```
Deletion: For deletion, we can delete a key that is either in a leaf or a non-leaf node. Again, deletion consists of a search, and, if the key is found in the tree, an actual deletion of an entry.

If we delete from a leaf node, there is no problem unless the leaf node becomes smaller than (m-1)/2 entries. Then we have to merge it with an adjacent leaf. If the resulting node has too many entries we have to split it in two and send the middle key up as for insertion.

Else if the key we wish to delete is not on a leaf node, then we replace it with the next larger entry in the B-tree. Analogous to finding the leftmost child in the right subtree of a binary tree. Follow pointer to next child node and then follow all leftmost pointers until you reach a leaf. Replace key to be deleted with smallest key in the leaf node and then delete that key from the leaf. Merge with an adjacent leaf if necessary (see process for deleting from a leaf node).

Let us Delete 44 (in a leaf) from the above tree:

```                                  [|30|]
/  \
/    \
[|12|20|]  [|36|42|]
/---------------/  /   |    |   \  \--------------\
/          /-------/    |    |    \------\          \
/          /            /      \           \          \
[ 8 10 ]  [ 15 18 ]  [ 25 26 28 ]  [ 32 34 ]  [ 37 39 ]  [ 43 55 ]
```
Now delete 18, in a leaf. Since the new leaf is too small, merge 15 20 25 26 28, move 25 up. The result is like a rotation in AVL Tree.
```                                  [|30|]
/  \
/    \
[|12|25|]  [|36|42|]
/---------------/  /   |    |   \  \--------------\
/          /-------/    |    |    \------\          \
/          /            /      \           \          \
[ 8 10 ]  [ 15 20 ]    [ 26 28 ]  [ 32 34 ]  [ 37 39 ]  [ 43 55 ]
```
Delete 36, replace with 37, delete old 37, merge 32, 34, new 37, 39, keep in one full node.
```                                  [|30|]
/  \
/    \
[|12|25|]  [|42|]
/---------------/  /   |    |   \
/          /-------/    |    |    \----------\
/          /            /      \               \
[ 8 10 ]  [ 15 20 ]    [ 26 28 ]  [ 32 34 37 39 ]  [ 43 55 ]
```
Now node with 42 has only one key, so it must be merged with 12 25 30 42 to form a new root node. Tree height is reduced by 1.

```                           [|12|25|30|42|]
/---------------/  /   |  |   \
/          /-------/    |  |    \----------\
/          /            /    \               \
[ 8 10 ]  [ 15 20 ]    [ 26 28 ]  [ 32 34 37 39 ]  [ 43 55 ]
```
Every n-node B-tree has height O(lg n), and every major operation works on a path from the root to a leaf.

3. Fibonacci Heaps

A Fibonacci heap is a collection of trees (not necessarily binary trees) with the min-heap property: the key of a node is greater than or equal to the key of its parent. The root of the whole heap is the root of the tree that has the smallest key.

In a Fibonacci heap, each node x contains a pointer to its parent and a pointer to an arbitrary one of its children. The children of a node are linked together in a circular, doubly linked "child list" in an arbitrary order. The "degree" of a node indicates its number of children.

Defined in this way, certain operations can be carried out in O(1) time. INSERT simply adds the new node as the root of a new tree and update the heap root when necessary; UNION concatenate the tree-root lists of two heaps and decide the heap root. Complicated structure maintenance only happen after the minimum value (heap root) is removed. After that the children of the removed node are treated as roots of separate trees, then the trees of the same degree are merged repeatedly, as in the following example:

A Fibonacci heap can also support other operations, like deleting a non-root node, decreasing a key value, etc.

4. Amortized Analysis

Compared to binary heap and self-balancing BST, Fibonacci heap and B-tree have more "relaxed" shape and order requirements, and some operations are executed in a "lazy" manner, i.e., postponing the work for later operations. Consequently, some operations can take a long time while others are done very quickly.

For this type of structures, for each operation it usually makes more sense to analyze its average cost when repeated in the worst case, which is called the "amortized cost" operation. For example, if a stack is implemented as an array with fixed length, push usually takes O(1) time, but the worst case is O(n) when space reallocation happens. Since the latter happens after the former happens O(n) times, the amortized cost is still O(1). Chapter 17 provides a detailed description of this analysis technique.

Here is a comparison between binary heap and Fibonacci heap: