Week 11

2-3-4 Trees 1/75

2-3-4 trees have variable-size nodes

each node contains 1 ≤ n ≤ 3 Items and n+1 subtrees
ordering as for BST (e.g. all keys in leftmost subtree < smallest key in node)
new values inserted at leaves; all leaves are at the same level
tree grows upward from root via split-promote

[Diagram:Pics/trees/2-3-4-tree-small.png]

... 2-3-4 Trees 2/75

2-3-4 tree implementation:

typedef struct node Node;
typedef struct node *Tree;
struct node {
    int  order;    // 2, 3 or 4
    Item data[3];  // items in node
    Tree child[4]; // links to subtrees
};

Example:

[Diagram:Pics/trees/2-3-4-nodes-small.png]

... 2-3-4 Trees 3/75

Search algorithm:

search(Tree, Key)
{
   if (empty(Tree)) return NOT_FOUND
   // scan root node, looking for key
   if (∃ i, key(data[i]) == Key)
      return Node containing data[i]
   if (Key < key(data[0]))
      return search(child[0],Key)
   if (∃ i, key(data[i]) < Key < key(data[i+1]))
      return search(child[i],Key)
   if (Key > key(data[order-1]))
      return search(child[N],Key)
   return NOT_FOUND
}

... 2-3-4 Trees 4/75

Insertion algorithm:

insert(Tree, Item)
{
   Node = search(Tree, key(Item)
   Parent = parent of Node
   if (order(Node) < 4)
      insert Item in Node, order++
   else {
      promote = Node.data[1]  // middle value
      NodeL   = new Node containing data[0]
      NodeR   = new Node containing data[2]
      if (key(Item) < key(data[1]))
         insert Item in NodeL
      else
         insert Item in NodeR
      insert promote into Parent
      while (order(Parent) == 4)
         continue promote/split upwards
      if (isRoot(Parent) && order(Parent) == 4)
         split root, making new root
   }
}

... 2-3-4 Trees 5/75

Insertion into a 2-node or 3-node:

[Diagram:Pics/trees/2-3-4-add-small.png]

Insertion into a 4-node (requires a split):

[Diagram:Pics/trees/2-3-4-split-small.png]

... 2-3-4 Trees 6/75

Splitting the root:

[Diagram:Pics/trees/2-3-4-split-root-small.png]

... 2-3-4 Trees 7/75

2-3-4 tree performance ...

Insertion (into tree of depth d) = O(d) comparisons

multiple comparisons in each of d 2-3-4 nodes
along with occasional splitting to shift values between nodes

Search (in tree of depth d) = O(d) comparisons

multiple comparisons in each of d 2-3-4 nodes

Depth of 2-3-4 tree with N nodes = log₄N < d < log₂N

Note that all paths in a 2-3-4 tree have same length d

... 2-3-4 Trees 8/75

Variations on 2-3-4 trees ...

Variation #1: why stop at 4? why not 2-3-4-5 trees? or M-way trees?

allow nodes to hold up to M-1 items, and at least M/2
if each node is a disk-page, then we have a B-tree (databases)
for B-trees, depending on Item size, M > 100/200/400

Variation #2: don't have "variable-sized" nodes

use standard BST nodes, augmented with one extra piece of data
implement similar strategy as 2-3-4 trees → red-black trees.

Red-Black Trees

Red-Black Trees 10/75

Red-black trees are a representation of 2-3-4 trees using BST nodes.

Definition of a red-black tree

a BST in which each node is marked red or black
no two red nodes appear consecutively on any path
a red node corresponds to a 2-3-4 sibling of its parent
a black node corresponds to a 2-3-4 child of its parent

Insertion algorithm: avoids worst case O(n) behaviour

Search algorithm: standard BST search

... Red-Black Trees 11/75

Representing 4-nodes in red-black trees:

[Diagram:Pics/trees/234-rb-nodes-small.png]

Note: some texts colour the links rather than the nodes.

... Red-Black Trees 12/75

Representing 3-nodes in red-black trees (two styles):

[Diagram:Pics/trees/234-rb-nodes2-small.png]

... Red-Black Trees 13/75

Equivalent trees (one 2-3-4, one red-black):

[Diagram:Pics/trees/234-rb-tree2-small.png]

... Red-Black Trees 14/75

Red-black tree implementation:

typedef enum {RED,BLACK} Colr;
typedef struct Node *Link;
typedef struct Node *Tree;
typedef struct Node {
   Item data;   // actual data
   Colr colour; // relationship to parent
   Link left;   // left subtree
   Link right;  // right subtree
} Node;

RED = node is part of the same 2-3-4 node as its parent (sibling)

BLACK = node is a child of the 2-3-4 node containing the parent

... Red-Black Trees 15/75

Making new nodes requires a colour:

Node *newNode(Item it, Colr c)
{
   Node *new = malloc(sizeof(Node));
   assert(new != NULL);
   new->data = it; new->colour = c;
   new->left = new->right = NULL;
   return new;
}

... Red-Black Trees 16/75

Searching method is standard BST search:

Item *search(Tree t, Key k)
{
   if (t == NULL) return NULL;
   int diff = cmp(k, key(t->data));
   if (diff < 0)
      return search(t->left, k);
   else if (diff > 0)
      return search(t->right, k);
   else // matches
      return &(t->data);
}

Exercise 1: 2-3-4 vs Red-Black Insertion 17/75

Show the 2-3-4 tree resulting from the insertion of:

10  5  9  6  2  4  20  15  18  19  17  12  13  14

Symbol Tables

Symbol Tables 19/75

A symbol table (dictionary) is a collection of items

each item has an identifying key (typically, a string)
primary operations are: insert, search (by key)

SymTab insert(SymTab t, Item it) { ... }
Item *search(SymTab t, Key k) { ... }

Applications of symbol tables:

programming language processors (e.g. compilers, interpreters)
text processing systems (spell-checkers, document retrieval)
implementing Set ADTs (insert = S ∪ {n}, search = n ∈ S)

Key-Indexed Symbol Table 20/75

Consider the following special case:

keys are integers in a relatively small range (e.g. 0..N-1)
symbol data implemented as array of Item (e.g. Item a[N])
indexed by Key values (mapped into valid index) (e.g. it = a[k])

Leads to very efficient representation

Cost(insert) = O(1), Cost(search) = O(1),
Cost(newSTab) = O(N), Cost(count) = O(1),
Cost(get_ith) = O(1), Cost(delete) = O(1)

... Key-Indexed Symbol Table 21/75

Data representation:

[Diagram:Pics/searching/key-indexed-small.png]

Note: UNUSED is distinguished from all other Item values.

Essentially a simple form of hashing (see later).

Exercise 2: Key-to-index Mapping 22/75

Define a function which

takes a key value in the range lo..hi
returns an index value into array of size hi-lo+1
aborts if the "key" value is outside the range lo..hi

E.g. lo == 12, hi == 27, a[16]

indexOf(12) == 0, indexOf(17) = 5,

indexOf(21) == 9, indexOf(27) == 15

Symbol Table Representations 23/75

Symbol tables can be represented in many ways:

key-indexed array (max # items, restricted key space)

key-sorted array (max # items, using binary search)

linked list (unlimited items, sorted list?)

binary search tree (unlimited items, traversal orders)

Costs (assuming N items):

Type	Search costs
	min	max	avg
key-indexed	1	1*	1
sorted array	1	log₂N	log₂N
linked list	1	N	N/2
BSTree	1	N*	log₂N

Hashing

Hashing 25/75

Key-indexed arrays had "perfect" search performance O(1)

but required a dense range of index values
used a fixed-size array (max size ever needed)
bigger array ⇒ more useful but wastes more space

Hashing allows us to approximate this performance, but

allows arbitrary types of keys
map (hash) keys into compact range of index values
store items in array, accessed by index value

... Hashing 26/75

The ideal for key-indexed collections:

courses["COMP3311"] = "Database Systems";
printf("%s\n", courses["COMP3311"]);

Almost as good:

courses[h("COMP3311")] = "Database Systems";
printf("%s\n", courses[h("COMP3311")]);

In practice:

item = {"COMP3311","Database Systems"};
courses = insert(courses, item);
printf("%s\n", search(courses, "COMP3311"));

... Hashing 27/75

To use arbitrary values as keys, we need three things:

set of Key values, each key identifies one Item
an array (of size N) to store Items
a hash function h() of type Key→[0..N-1]
- requirement: if (x == y) then h(x) == h(y)
- requirement: h(x) always returns same value for given x
a collision resolution method
- collision = (x != y && h(x) == h(y))
- collisions are inevitable when dom(Key) >> N

... Hashing 28/75

Generalised ADT for a collection of Items

Interface:

typedef struct CollectionRep *Collection;

Collection newCollection();    // make new empty collection
Item *search(Collection, Key); // find item with key
void insert(Collection, Item); // add item into collection
void delete(Collection, Key);  // drop item with key

Implementation:

typedef struct CollectionRep {
   ... some data structure to hold multiple Items ...
} CollectionRep;

... Hashing 29/75

For hash tables, we make one change to interface:

typedef struct HashTabRep *HashTable;
// make new empty table of size N
HashTable newHashTable(int);
Item *search(HashTable, Key); // find item with key
void insert(HashTable, Item); // add item into collection
void delete(HashTable, Key);  // drop item with key

Implementation:

typedef struct HashTabRep {...Items[N]...} HashTabRep;
... plus ...
int hash(Key k, int N);  // hash function giving 0..N-1

Exercise 3: Hash Lab 30/75

Implement a HashLab which:

allows you to specify
- the size of the hash table array
- the hash function to use (0..3)
- the collision resolution strategy (C,L,D)
loads the hash table with dictionary words
runs performance test by searching for each word
records average # items considered in searches

Hashing 31/75

Hashing is a method for maintaining a collection

via an array of Items (or (Item *)s), e.g. Item *a[N];
with optimal performance: O(1) search/insert/delete

Requires a function to map keys to indexes: hash: Key → 0..N-1

[Diagram:Pics/hashing/hashing-review-small.png]

... Hashing 32/75

Hash table interface:

typedef struct HashTabRep *HashTable;
// make new empty table of size N
HashTable newHashTable(int);
// find item with key
Item *search(HashTable, Key);
// add item into collection
void insert(HashTable, Item);
// drop item with key
void delete(HashTable, Key);

Exercise 4: Hash Lab 33/75

Implement a HashLab which:

allows you to specify
- the size of the hash table array
- the hash function to use (0..3)
- the collision resolution strategy (C,L,D)
loads the hash table with dictionary words
runs performance test by searching for each word
records average # items considered in searches

... Hashing 34/75

Example hash table implementation:

typedef struct HashTabRep {
   int  N;       // size of array
   Item **items; // array of (Item *)
} HashTabRep;

HashTable newHashTable(int N)
{
   HashTable new = malloc(sizeof(HashTabRep));
   new->items = malloc(N*sizeof(Item *));
   new->N = N;
   for (int i = 0; i < N; i++)
      { new->items[i] = NULL; }
   return new;
}

... Hashing 35/75

Idealised versions of HashTable operations:

Item *search(HashTable ht, Key k)
{
    int i = hash(k, ht->N);
    return ht->items[i];
}
void insert(HashTable ht, Item it)
{
    int i = hash(key(it), ht->N);
    ht->items[i] = newItem(it);
}
void delete(HashTable ht, Key k)
{
    int i = hash(k, ht->N);
    free(ht->items[i]);
    ht->items[i] = NULL;
}

Hash Functions 36/75

Points to note:

converts Key value to index value [0..N-1]
deterministic (key value k always maps to same value)
use mod function to map hash value to index value
spread key values uniformly over address range
(assumes that keys themselves are uniformly distributed)
as much as possible, h(k) ≠ h(j) if j ≠ k
cost of computing hash function must be cheap

... Hash Functions 37/75

Basic idea behind hash function

int hash(Key key, int N)
{
   int val = convert key to int;
   return val % N;
}

If keys are ints, conversion is easy (identity function)

How to convert keys which are strings? (e.g. "COMP1927" or "9300035")

Exercise 5: Hash Functions (i) 38/75

Consider this potential hash function:

int hash(char *key, int N)
{
    int h = 0; char *c;
    for (c = key; *c != '\0'; c++)
        h = h + *c;
    return h % N;
}

How does this function convert strings to ints?

What are the deficiencies with this function and how can it be improved?

... Hash Functions 39/75

A slightly more sophisticated hash function

int hash(char *key, int N)
{
   int h = 0;  char *c;
   int a = 127; // a prime number
   for (c = key; *c != '\0'; c++)
      h = (a * h + *c) % N;
   return h;
}

Converts strings into integers in table range.

But poor choice of a (e.g. 128) can result in poor hashing.

... Hash Functions 40/75

To use all of value in hash, with suitable "randomization":

int hash(char *key, int N)
{
   int h = 0, a = 31415, b = 21783;
   char *c;
   for (c = key; *c != '\0'; c++) {
      a = a*b % (N-1);
      h = (a * h + *c) % N;
   }
   return h;
}

This approach is known as universal hashing.

... Hash Functions 41/75

A real hash function (from PostgreSQL DBMS):


hash_any(unsigned char *k, register int keylen, int N)
{
    register uint32 a, b, c, len;
    // set up internal state
    len = keylen;
    a = b = 0x9e3779b9;
    c = 3923095;
    // handle most of the key, in 12-char chunks
    while (len >= 12) {
        a += (k[0] + (k[1] << 8) + (k[2] << 16) + (k[3] << 24));
        b += (k[4] + (k[5] << 8) + (k[6] << 16) + (k[7] << 24));
        c += (k[8] + (k[9] << 8) + (k[10] << 16) + (k[11] << 24));
        mix(a, b, c);
        k += 12; len -= 12;
    }
    // collect any data from remaining bytes into a,b,c
    mix(a, b, c);
    return c % N;
}

... Hash Functions 42/75

Where mix is defined as:


#define mix(a,b,c) \
{ \
  a -= b; a -= c; a ^= (c>>13); \
  b -= c; b -= a; b ^= (a<<8);  \
  c -= a; c -= b; c ^= (b>>13); \
  a -= b; a -= c; a ^= (c>>12); \
  b -= c; b -= a; b ^= (a<<16); \
  c -= a; c -= b; c ^= (b>>5);  \
  a -= b; a -= c; a ^= (c>>3);  \
  b -= c; b -= a; b ^= (a<<10); \
  c -= a; c -= b; c ^= (b>>15); \
}

i.e. scrambles all of the bits from the bytes of the key value

Hash Table ADT 43/75

Enhanced concrete data representation:

#include "Item.h"  // Item has key and data

#define NoItem distinguished Item value

typedef struct HashTabRep {
   Item *items; // array of Items
   int  nslots; // # elements in array  (was called N)
   int  nitems; // # items stored in array
} HashTabRep;

typedef HashTabRep *HashTable;

Exercise 6: NoItem values 44/75

Suggest suitable NoItem values if

keys are integers
keys are strings
items[] is an array of (Item *)

... Hash Table ADT 45/75

Hash table initialisation:

create a Rep and an array, fill array with NoItem

HashTable newHashTable(int N)
{
   HashTabRep *new = malloc(sizeof(HashTabRep));
   assert(new != NULL);
   new->items = malloc(N*sizeof(Item));
   assert(new->items != NULL);
   for (int i = 0; i < N; i++)
      new->items[i] = NoItem;
   new->nitems = 0; new->nslots = N;
   return new;
}

... Hash Table ADT 46/75

Search function

Item *search(HashTable ht, Key k) {
   int i = hash(k, ht->nslots);
   if (ht->items[i] == NoItem)
      return NULL;
   else if (key(ht->items[i]) != k)
      return NULL;
   else
      return &(ht->items[i]);
}

... Hash Table ADT 47/75

Functions to maintain hash table:

void insert(HashTable ht, Item it) {
   int i = hash(key(it), ht->nslots);
   if (ht->items[i] == NoItem)
      { ht->items[i] = it; ht->nitems++; }
   else if (key(ht->items[i] == key(it))
      ht->items[i] = it;  // update
   else { // (key(ht->items[i] != key(it))
      // ... what to do? 
   }
}
void delete(HashTable ht, Key k) {
   int i = hash(k, ht->nslots);
   if (ht->items[i] == NoItem)
      return;
   else if (key(ht->items[i] == k)
      { ht->items[i] = NoItem; ht->nitems--; }
   else { // (key(ht->items[i] != key(it))
      return;  // no item with key k in table
   }
}

Problems with Hashing 48/75

In ideal scenarios, search cost in hash table is O(1).

Problems with hashing:

hash function relies on size of array (⇒ can't expand)
- changing the size of the array changes the hash function
- could make array larger, but would need to re-insert all Items
items are stored in (effectively) random order
if size(KeySpace) ≫ size(IndexSpace), collisions inevitable
- collision: k != j && hash(k,N) == hash(j,N)
if nitems > nslots, collisions inevitable

Exercise 7: Expanding Hash Table 49/75

Write a function

HashTable expand(HashTable ht) { ... }

which doubles the number of slots in a hash table

Collision Resolution 50/75

Three approaches to dealing with hash collisions:

allow multiple Items in a single array location
- e.g. array of linked lists (mix of O(1) and O(N))
systematically compute new indexes until find a free slot
- need strategies for computing new indexes (aka probing)
increase the size of the array
- needs a method to "adjust" hash() (e.g. linear hashing)

Separate Chaining 51/75

Solve collisions by having multiple items per array entry.

Make each element the start of linked-list of Items.

[Diagram:Pics/hashing/hash-linked-small.png]

... Separate Chaining 52/75

Concrete data structure for hashing via chaining

typedef struct HashTabRep {
   List *lists; // array of Lists of Items
   int  nslots; // # elements in array
   int  nitems; // # items stored in HashTable
} HashTabRep;

HashTable newHashTable(int N)
{
   HashTabRep *new = malloc(sizeof(HashTabRep));
   assert(new != NULL);
   new->lists = malloc(N*sizeof(List));
   assert(new->lists != NULL);
   for (int i = 0; i < N; i++)
      new->lists[i] = newList();
   new->nslots = N; new->nitems = 0;
   return new;
}

... Separate Chaining 53/75

Using the List ADT, search becomes:

#include "List.h" 
Item *search(HashTable ht, Key k)
{
   int i = hash(k, ht->nslots);
   return ListSearch(ht->lists[i], k);
}

Even without List abstraction, easy to implement.

Using sorted lists gives only small performance gain.

... Separate Chaining 54/75

Other list operations are also simple:

#include "List.h"

void insert(HashTable ht, Item it) {
   Key k = key(it);
   int i = hash(k, ht->nslots);
   ListInsert(ht->lists[i], it);
}
void delete(HashTable ht, Key k) {
   int i = hash(k, ht->nslots);
   ListDelete(ht->lists[i], k);
}

Essentially: select a list; operate on that list.

... Separate Chaining 55/75

Cost analysis:

N array entries (slots), M stored items
average list length L = M/N
best case: all lists are same length L
worst case: h(k)=0, one list of length M
searching within a list of length n:
- best: 1, worst: n, average: n/2
if good hash and M≤N, cost is 1
if good hash and M>N, cost is (M/N)/2

Ratio of items/slots is called load α = M/N

Linear Probing 56/75

Collision resolution by finding a new location for Item

hash indicates slot i which is already used
try next slot, then next, until we find a free slot
insert item in available slot

Examples:

[Diagram:Pics/hashing/hash-linear-small.png]

Hashing 57/75

Hashing is a method for maintaining a collection

via an array of Items (or (Item *)s), e.g. Item *a[N];
with optimal performance: O(1) search/insert/delete

Requires a function to map keys to indexes: hash: Key → 0..N-1

... Hashing 58/75

Hash table interface:

typedef struct HashTabRep *HashTable;
// make new empty table of size N
HashTable newHashTable(int);
// find item with key
Item *search(HashTable, Key);
// add item into collection
void insert(HashTable, Item);
// drop item with key
void delete(HashTable, Key);

... Hashing 59/75

Possible concrete data representation:

#define NoItem distinguished Item value

typedef struct HashTabRep {
   Item *items; // array of Items
   int  nslots; // # elements in array
   int  nitems; // # items stored in array
} HashTabRep;

typedef HashTabRep *HashTable;

Assume: key(NoItem) matches no real Key value

... Hashing 60/75

Consider a hash table with N slots ...

When using chaining

can store > N items in table
h(x) == h(y) means add x and y into same list

When using a fixed-size array

can store ≤ N items in table (typically ≤ 2N/3)
h(x) == h(y) handled by probing (table scan)

Linear Probing 61/75

... Linear Probing 62/75

Insert function for linear probing:

void insert(HashTable ht, Item it)
{
   assert(ht->nitems < ht->nslots);
   int N = ht->nslots;
   Item *a = ht->items;
   Key k = key(it);
   int i, j, h = hash(k,N);
   for (j = 0; j < N; j++) {
      i = (h+j)%N;
      if (a[i] == NoItem) break;
      if (eq(k,key(a[i]))) break;
   }
   if (a[i] == NoItem) ht->nitems++;
   a[i] = it;
}

... Linear Probing 63/75

Search function for linear probing:

Item *search(HashTable ht, Key k)
{
   int N = ht->nslots;
   Item *a = ht->items;
   int i, j, h = hash(k,N);
   for (j = 0; j < N; j++) {
      i = (h+j)%N;
      if (a[i] == NoItem) return NULL;
      if (eq(k,key(a[i]))) return &(a[i]);
   }
   return NULL;
}

... Linear Probing 64/75

Search cost analysis:

cost to reach first Item is O(1)
subsequent cost depends how much we need to scan
affected by load α = M/N (i.e. how "full" is the table)
Avg Cost for successful search = 0.5*(1 + 1/(1-α))
Avg Cost for unsuccessful search = 0.5*(1 + 1/(1-α)²)

Example costs:

load (α)	1/2	2/3	3/4	9/10
search hit	1.5	2.0	3.0	5.5
search miss	2.5	5.0	8.5	55.5

Assumes reasonably uniform data and good hash function.

... Linear Probing 65/75

Deletion slightly tricky for linear probing.

Need to ensure no NoItem in middle of "probe path"
(i.e. previously relocated items moved to appropriate location)

[Diagram:Pics/hashing/hash-probe-delete-small.png]

... Linear Probing 66/75

Delete function for linear probing:

void delete(HashTable ht, Key k)
{
   int N = ht->nslots;
   Item *a = ht->items;
   int i, j, h = hash(k,N);
   for (j = 0; j < N; j++) {
      i = (h+j)%N;
      if (a[i] == NoItem) return; // k not in table
      if (eq(k,key(a[i]))) break;
   }
   a[i] = NoItem;
   ht->nitems--;
   // clean up probe path
   j = i+1;
   while (a[j] != NoItem) {
      Item it = a[j];
      a[j] = NoItem;
      ht->nitems--;
      insert(ht, it);
      j = (j+1)%N;
   }
}

Exercise 8: Linear Probing Example 67/75

Consider a linear-probed hash table

N = 10 table slots, hash(k) = k%10

Show the result of inserting items with these keys

1, 2, 3, 4, 5, 6, 7, 8, 9
15, 6, 20, 3, 17, 14, 33, 5

into an initially empty table

Exercise 9: Alternative Deletion Handling 68/75

To simplify NoItem deletion problem ...

leave key in place after deletion (for search)
flag it as being deleted (for insert)

Give a data structure for this and re-implement functions.

... Linear Probing 69/75

A problem with linear probing: clusters

E.g. insert 5, 6, 15, 16, 7, 17, with hash = k%10

[Diagram:Pics/hashing/clustering-small.png]

Double Hashing 70/75

Double hashing improves on linear probing:

by using an increment which ...
- is based on a secondary hash of the key
- ensures that all elements are visited
  (can be ensured by using an increment which is relatively prime to N)
tends to eliminate clusters ⇒ shorter probe paths

To generate relatively prime

set table size to prime e.g. N=127
hash2() in range [1..N1] where N1 < 127 and prime

... Double Hashing 71/75

Concrete data structures for hashing via double hashing:

typedef struct HashTabRep {
   Item *items; // array of Items
   int  nslots; // # elements in array
   int  nitems; // # items stored in HashTable
   int  nhash2; // second hash mod
} HashTabRep;

#define hash2(k,N2) (((k)%N2)+1)

HashTable newHashTable(int N)
{
   HashTabRep *new = malloc(sizeof(HashTabRep));
   assert(new != NULL);
   new->items = malloc(N*sizeof(Item));
   assert(new->items != NULL);
   for (int i = 0; i < N; i++)
      new->items[i] = NoItem;
   new->nslots = N; new->nitems = 0;
   new->nhash2 = findSuitablePrime(N);
   return new;
}

... Double Hashing 72/75

Search function for double hashing:

Item *search(HashTable ht, Key k)
{
   int N = ht->nslots;
   Item *data = ht->items;
   int i, j, h = hash(k,N);
   int incr = hash2(k,ht->nhash2);
   for (j = 0, i = h; j < N; j++) {
      if (eq(k,key(data[i]) == 0)
         return &(data[i]);
      i = (i+incr)%N;
   }
   return NULL;
}

... Double Hashing 73/75

Insert function for double hashing:

void insert(HashTable ht, Item it)
{
   int N = ht->nslots;
   Item *data = ht->items;
   Key k = key(it);
   int i, j, h = hash(k,N);
   int incr = hash2(k,ht->nhash2);
   for (j = 0; j < N; j += incr) {
      ix = (i+j)%N;
      if (cmp(k,key(data[ix]) == 0)
         break;
      else if (data[ix] == NoItem)
         break;
   }
   assert(j != N); // table full
   if (data[ix] == NoItem) ht->nitems++;
   data[ix] = it;
}

... Double Hashing 74/75

Costs for double hashing:

load (α)	1/2	2/3	3/4	9/10
search hit	1.4	1.6	1.8	2.6
search miss	1.5	2.0	3.0	5.5

Can be significantly better than linear probing

especially if table is heavily loaded

Hashing Summary 75/75

Collision resolution approaches:

chaining: easy to implement, allows α > 1
linear probing: fast if α << 1, complex deletion
double hashing: faster than linear probing, esp for α ≅ 1

Only chaining allows α > 1, but performance degrades once α > 1

Once M exceeds initial choice of N,

need to expand size of array (N)
problem: hash function relies on N,
so changing array size potentially requires rebuiling whole table
dynamic hashing methods exist to avoid this

Produced: 9 Oct 2017