3223-04-3

CIS 3223. Data Structures and Algorithms

Self-balancing search trees (3)

1. 2-3 tree

The idea behind 2-3 tree and some related data structures is to extend a binary tree into a "multi-way" tree. Instead of putting 1 key value and 2 branches into each node, we can put m keys and m+1 branches, and keep them "sorted" as in binary search trees.

A 2-3 tree is a tree where each node contains 1 or 2 key values (so has 2 or 3 children). Within the tree, a 2-node has 1 key and 2 children, and a 3-node has 2 keys and 3 children. All leaves are on the same level.

                [|7|]
                /   \
            [|3|]   [|11|15|]
            /  \     /  |  \
         [1]  [5]  [9] [13] [17 19]

Searching a 2-3 tree is similar to searching a binary search tree, except that within each 3-node, in the worst case two comparisons are needed.

Insertion: In a 2-3 tree, a new key value is initially inserted into a leaf, according to the order of a search tree. Then there are the following possibilities:

If the leaf was a 2-node before the insertion, it becomes a 3-node.
If the leaf was a 3-node, it is split into two 2-node, and the middle value is inserted into the parent. This process may repeat at the parent. If the root splits, a new root is generated.

Demo applet

Deletion: To remove a key value from a 2-3 tree, the first step is to search for it. If it is in a leaf, simply remove it. If it is in an internal node, replace it by its predecessor (or successor), which must be in a leaf.

After the removal of the key value, if a leaf becomes empty, there are two possibilities:

If it has a 3-node sibling, a key is moved from the sibling to the parent, and the key in the parent is moved into the empty leaf.
If it has no 3-node sibling, it is merged with a sibling with key from the parent. This process may repeat at the parent. If the root become empty, it is removed.

In a 2-3 tree, all leaves are at the same level, and the complexity of major operations is O(log n).

2. 2-3-4 Trees

A 2-3-4 tree extends a 2-3 tree by allowing 4-nodes, so each non-leaf node can have 2, 3, or 4 children.

The operations of 2-3-4 trees are similar to those of 2-3 trees.

There is a mapping between a 2-3-4 tree and a Red-Black tree. In the following, a lower case letter represents a key, and an upper case letter represents a subtree.

               [|b|]                    b
               /   \                  /   \
              A     C                A     C

              [|b|d|]                   b               d
              /  |  \                 /   \           /   \
             A   C   E               A     d    or   b     E
                                          / \       / \
                                         C   E     A   C

              [|b|d|f|]                   d
              /  | |  \                 /   \
             A   C E   G               b     f
                                     /  \   /  \      
                                    A    C E    G

Since each node in a 2-3-4 true correspond to one black node (plus at most two red nodes) in a Red-Black tree, the height of a 2-3-4 tree correspond to the number of black nodes in the path from the root to a leaf in a Red-Black tree, which is half of the number of comparisons in the worst case.

3. B-Trees

B-trees are balanced search trees that are optimized for situations when part or all of the tree must be maintained in secondary storage such as a magnetic disk.

If you have a very large data base (millions of items) that cannot fit in main memory, you need to store it on disk and come up with some way to search it efficiently. Disk accesses are very time consuming compared to memory accesses — approximately 200K instructions can execute in the time it takes to make one disk access. So you want to come up with a method that involves few disk accesses. If a binary tree is stored in secondary memory (with pointers to disk addresses) it will take too long to search it.

The height of a tree is what determines the number of comparisons and disk accesses we need to make. A binary tree has a minimum height of log(n) because it only stores 1 key in a node and has 2 branches. If we can come up with a technique for representing a tree node that stores more key and has more branches, we could store all the keys in a data structure that is not as deep and thus search it with fewer disk accesses.

A B-tree is a way of doing this. An order-m B-tree has the following properties:

Every node contains at most m-1 keys, and every node except the root contains at least (m-1)/2 keys [if m is even, then let it be floor((m-1)/2)].
If a non-leaf node contains n keys, it has n+1 children.
All leaf nodes are at the same level of the tree.
All keys stored in a node are in sequential order.
A node with n keys ((m-1)/2 <= n <= m-1) has the following structure:
{address of B-tree node for keys < key 1}
{key 1, address of data for key 1, address of B-tree node with keys > key 1 and < key 2}
{key 2, address of data for key 2, address of B-tree node with keys > key 2 and < key 3}
{key 3, address of data for key 3, address of B-tree node with keys > key 3 and < key 4}
. . .
{key n, address of data for key n, address of B-tree node with keys > key n}

A 2-3 tree is an order-3 B-tree, and a 2-3-4 tree is an order-4 B-tree. Alternative definitions may use "order" to indicate the maximum or minimum number of keys in each node, rather than the maximum number of children.

Example: an order-5 B-tree.

                  [|20|30|42|]         
                  /   |  |   \
        /--------/   /    \   \--------\
       /            /      \            \
   [ 10 15 ]  [ 25 28 ]  [ 32 34 ]  [ 44 55 ]

Search algorithm:

  If pointer to node is null
       target key is not in tree - return null
  Else
       Search current node 
       If target key is found
	       return disk address of item with that key
       Else
           If target key is < first key on node 
                Search first child of current node
           Else if key k < target key < key k+1
		        Search child k 
           Else if target key is > last key on node
                Search last child of current node

Insertion: Insert can be seen as a search (for the inserting position) followed by the actual insert of the entry. As in binary search trees, we always insert in a leaf node until the leaf node has m entries. Then we split it into two parts and send the middle entry up to the parent node. For the previous order-5 B tree, we can add 2 entries to each leaf node without having to split a leaf node.

New let's add the following entries: 8, 18, 26, 36, 39, 43

                          [|20|30|42|]         
                          /   |  |   \
           /-------------/   /    \   \---------------\
          /                 /      \                   \
   [ 8 10 15 18 ]  [ 25 26 28 ]  [ 32 34 36 39 ]  [ 43 44 55 ]

At this point 2 of the leaf nodes are full and 2 are not. Let's insert in one of the full leaf nodes and see what happens.

Insert 37: 37 in not in the root so insert in the child node that contains keys between 31 and 41, inclusive. That node would contain: 32 34 36 37 39 which is too big, so split it into two nodes and pass the middle value (36) up.

                              [|20|30|36|42|]         
                              /   |  |  |   \
           /-----------------/ /-/   |   \-\ \------------\
          /                   /      |      \              \
   [ 8 10 15 18 ]  [ 25 26 28 ]  [ 32 34 ]  [ 37 39 ]  [ 43 44 55 ]

Now let's insert in the other node which is full. Remember that all insertions begin at a leaf node. Insert 12 in leftmost leaf: 8 10 12 15 18 - too big, split and pass 12 up.

Parent node becomes 12 20 30 36 42 - too big, split in 2 and pass 30 up. New tree has 3 levels (it is fine if the root has less than m/2 entries). From this example, we see that B-Trees actually grow from the bottom up, rather than the top-down.

                                  [|30|]
                                   /  \
                                  /    \
                           [|12|20|]  [|36|42|]         
           /---------------/  /   |    |   \  \--------------\
          /          /-------/    |    |    \------\          \
         /          /            /      \           \          \
   [ 8 10 ]  [ 15 18 ]  [ 25 26 28 ]  [ 32 34 ]  [ 37 39 ]  [ 43 44 55 ]

Deletion: For deletion, we can delete a key that is either in a leaf or a non-leaf node. Again, deletion consists of a search, and, if the key is found in the tree, an actual deletion of an entry.

If we delete from a leaf node, there is no problem unless the leaf node becomes smaller than (m-1)/2 entries. Then we have to merge it with an adjacent leaf. If the resulting node has too many entries we have to split it in two and send the middle key up as for insertion.

Else if the key we wish to delete is not on a leaf node, then we replace it with the next larger entry in the B-tree. Analogous to finding the leftmost child in the right subtree of a binary tree. Follow pointer to next child node and then follow all leftmost pointers until you reach a leaf. Replace key to be deleted with smallest key in the leaf node and then delete that key from the leaf. Merge with an adjacent leaf if necessary (see process for deleting from a leaf node).

Let us Delete 44 (in a leaf) from the above tree:

                                  [|30|]
                                   /  \
                                  /    \
                           [|12|20|]  [|36|42|]         
           /---------------/  /   |    |   \  \--------------\
          /          /-------/    |    |    \------\          \
         /          /            /      \           \          \
   [ 8 10 ]  [ 15 18 ]  [ 25 26 28 ]  [ 32 34 ]  [ 37 39 ]  [ 43 55 ]

Now delete 18, in a leaf. Since the new leaf is too small, merge 15 20 25 26 28, move 25 up. The result is like a rotation in AVL Tree.

                                  [|30|]
                                   /  \
                                  /    \
                           [|12|25|]  [|36|42|]         
           /---------------/  /   |    |   \  \--------------\
          /          /-------/    |    |    \------\          \
         /          /            /      \           \          \
   [ 8 10 ]  [ 15 20 ]    [ 26 28 ]  [ 32 34 ]  [ 37 39 ]  [ 43 55 ]

Delete 36, replace with 37, delete old 37, merge 32, 34, new 37, 39, keep in one full node.

                                  [|30|]
                                   /  \
                                  /    \
                           [|12|25|]  [|42|]         
           /---------------/  /   |    |   \  
          /          /-------/    |    |    \----------\ 
         /          /            /      \               \
   [ 8 10 ]  [ 15 20 ]    [ 26 28 ]  [ 32 34 37 39 ]  [ 43 55 ]

Now node with 42 has only one key, so it must be merged with 12 25 30 42 to form a new root node. Tree height is reduced by 1.

                           [|12|25|30|42|]         
           /---------------/  /   |  |   \  
          /          /-------/    |  |    \----------\ 
         /          /            /    \               \
   [ 8 10 ]  [ 15 20 ]    [ 26 28 ]  [ 32 34 37 39 ]  [ 43 55 ]

Demo applet