2-3-4 Trees

COMP2521 20T2 ♢ 2-3-4 Trees [0/16]
❖ Search Cost

Critical factor determining search cost in BSTs

Either way, path length (tree depth) is a critical factor

In a perfectly balanced tree, max path length = log2n

The 2 in the path length is the branching factor  (binary search tree)

What if branching factor > 2?

COMP2521 20T2 ♢ 2-3-4 Trees [1/16]
❖ 2-3-4 Trees

2-3-4 trees have three kinds of nodes

COMP2521 20T2 ♢ 2-3-4 Trees [2/16]
❖ ... 2-3-4 Trees

2-3-4 trees are ordered similarly to BSTs

[Diagram:Pics/2-3-4-order.png]

In a balanced 2-3-4 tree:

2-3-4 trees grow "upwards" from the leaves, via node splitting.
COMP2521 20T2 ♢ 2-3-4 Trees [3/16]
❖ Node splitting

Insertion into a full node causes a split


[Diagram:Pics/2-3-4-split.png]

COMP2521 20T2 ♢ 2-3-4 Trees [4/16]
❖ ... Node splitting


Intermediate stage of insert-split:


[Diagram:Pics/2-3-4-split1.png]

COMP2521 20T2 ♢ 2-3-4 Trees [5/16]
❖ ... Node splitting

Searching in 2-3-4 trees:

Search(tree,item):
|  Input  tree, item
|  Output address of item if found in 2-3-4 tree
|         NULL otherwise
|
|  if tree is empty then
|     return NULL
|  else
|  |  scan tree.data to find i such that
|  |     tree.data[i-1] < item ≤ tree.data[i]
|  |  if item=tree.data[i] then   // item found
|  |     return address of tree.data[i]
|  |  else       // keep looking in relevant subtree
|  |     return Search(tree.child[i],item)
|  |  end if
|  end if

COMP2521 20T2 ♢ 2-3-4 Trees [6/16]
❖ Data Structure

Possible concrete 2-3-4 tree data structure:

typedef struct node {
   int          order;     // 2, 3 or 4
   int          data[3];   // items in node
   struct node *child[4];  // links to subtrees
} node;

[Diagram:Pics/2-3-4-node.png]

COMP2521 20T2 ♢ 2-3-4 Trees [7/16]
❖ ... Data Structure

Finding which branch to follow

// n is a pointer to a (struct node)
int i;
for (i = 0; i < n->order-1; i++) {
   if (item <= n->data[i]) break;
}
// go to the ith subtree, unless item == n->data[i]

[Diagram:Pics/2-3-4-node1.png]

COMP2521 20T2 ♢ 2-3-4 Trees [8/16]
❖ Search Cost Analysis


2-3-4 tree searching cost analysis:

COMP2521 20T2 ♢ 2-3-4 Trees [9/16]
❖ Insertion into 2-3-4 Trees

Insertion algorithm:

COMP2521 20T2 ♢ 2-3-4 Trees [10/16]
❖ ... Insertion into 2-3-4 Trees

Insertion into a 2-node or 3-node:

[Diagram:Pics/2-3-4-add.png]

Insertion into a 4-node (requires a split):

[Diagram:Pics/2-3-4-split.png]

COMP2521 20T2 ♢ 2-3-4 Trees [11/16]
❖ ... Insertion into 2-3-4 Trees

Splitting the root:

[Diagram:Pics/2-3-4-split-root.png]

COMP2521 20T2 ♢ 2-3-4 Trees [12/16]
❖ ... Insertion into 2-3-4 Trees

Building a 2-3-4 tree … 7 insertions:

[Diagram:Pics/2-3-4-insert.png]

COMP2521 20T2 ♢ 2-3-4 Trees [13/16]
❖ ... Insertion into 2-3-4 Trees

Insertion algorithm:

insert(tree,item):
|  Input  2-3-4 tree, item
|  Output tree with item inserted
|
|  if tree is empty then
|     return new node containing item
|  end if
|  node=Search(tree,item)
|  parent=parent of node
|  if node.order < 4 then
|     insert item into node
|     increment node.order
|  else
|  |  promote = node.data[1]     // middle value
|  |  nodeL   = new node containing data[0]
|  |  nodeR   = new node containing data[2]
|  |  delete node
|  |  if item < promote then
|  |     insert(nodeL,item)
|  |  else
|  |     insert(nodeR,item)
|  |  end if
|  |  insert(parent,promote)
|  |  while parent.order=4 do
|  |     continue promote/split upwards
|  |  end while
|  |  if parent is root ∧ parent.order=4 then
|  |     split root, making new root
|  |  end if
|  end if
COMP2521 20T2 ♢ 2-3-4 Trees [14/16]
❖ ... Insertion into 2-3-4 Trees


Insertion cost  (remembering that 2-3-4 trees are balanced ⇒ h = log4n)

Overall insertion cost = O(log n)
COMP2521 20T2 ♢ 2-3-4 Trees [15/16]
❖ 2-3-4 Variations

Variations on 2-3-4 trees …


Variation #1: why stop at 4? why not 2-3-4-5 trees? or M-way trees?


Variation #2: don't have "variable-sized" nodes
COMP2521 20T2 ♢ 2-3-4 Trees [16/16]


Produced: 17 Jun 2020