Week 11
2-3-4 Trees | 1/75 |
2-3-4 trees have variable-size nodes
Item
... 2-3-4 Trees | 2/75 |
2-3-4 tree implementation:
typedef struct node Node; typedef struct node *Tree; struct node { int order;// 2, 3 or 4 Item data[3];// items in node Tree child[4];// links to subtrees };
Example:
... 2-3-4 Trees | 3/75 |
Search algorithm:
search(Tree, Key) { if (empty(Tree)) return NOT_FOUND// scan root node, looking for key if (∃ i, key(data[i]) == Key) return Node containing data[i] if (Key < key(data[0])) return search(child[0],Key) if (∃ i, key(data[i]) < Key < key(data[i+1])) return search(child[i],Key) if (Key > key(data[order-1])) return search(child[N],Key) return NOT_FOUND }
... 2-3-4 Trees | 4/75 |
Insertion algorithm:
insert(Tree, Item) { Node = search(Tree, key(Item) Parent = parent of Node if (order(Node) < 4) insert Item in Node, order++ else { promote = Node.data[1]// middle value NodeL = new Node containing data[0] NodeR = new Node containing data[2] if (key(Item) < key(data[1])) insert Item in NodeL else insert Item in NodeR insert promote into Parent while (order(Parent) == 4) continue promote/split upwards if (isRoot(Parent) && order(Parent) == 4) split root, making new root } }
... 2-3-4 Trees | 5/75 |
Insertion into a 2-node or 3-node:
Insertion into a 4-node (requires a split):
... 2-3-4 Trees | 6/75 |
Splitting the root:
... 2-3-4 Trees | 7/75 |
2-3-4 tree performance ...
Insertion (into tree of depth d) = O(d) comparisons
Note that all paths in a 2-3-4 tree have same length d
... 2-3-4 Trees | 8/75 |
Variations on 2-3-4 trees ...
Variation #1: why stop at 4? why not 2-3-4-5 trees? or M-way trees?
Item
Red-Black Trees |
Red-Black Trees | 10/75 |
Red-black trees are a representation of 2-3-4 trees using BST nodes.
Definition of a red-black tree
Search algorithm: standard BST search
... Red-Black Trees | 11/75 |
Representing 4-nodes in red-black trees:
Note: some texts colour the links rather than the nodes.
... Red-Black Trees | 12/75 |
Representing 3-nodes in red-black trees (two styles):
... Red-Black Trees | 13/75 |
Equivalent trees (one 2-3-4, one red-black):
... Red-Black Trees | 14/75 |
Red-black tree implementation:
typedef enum {RED,BLACK} Colr; typedef struct Node *Link; typedef struct Node *Tree; typedef struct Node { Item data;// actual data Colr colour;// relationship to parent Link left;// left subtree Link right;// right subtree } Node;
RED
BLACK
... Red-Black Trees | 15/75 |
Making new nodes requires a colour:
Node *newNode(Item it, Colr c) { Node *new = malloc(sizeof(Node)); assert(new != NULL); new->data = it; new->colour = c; new->left = new->right = NULL; return new; }
... Red-Black Trees | 16/75 |
Searching method is standard BST search:
Item *search(Tree t, Key k) { if (t == NULL) return NULL; int diff = cmp(k, key(t->data)); if (diff < 0) return search(t->left, k); else if (diff > 0) return search(t->right, k); else// matches return &(t->data); }
Exercise 1: 2-3-4 vs Red-Black Insertion | 17/75 |
Show the 2-3-4 tree resulting from the insertion of:
10 5 9 6 2 4 20 15 18 19 17 12 13 14
Symbol Tables |
Symbol Tables | 19/75 |
A symbol table (dictionary) is a collection of items
SymTab insert(SymTab t, Item it) { ... } Item *search(SymTab t, Key k) { ... }
Applications of symbol tables:
Key-Indexed Symbol Table | 20/75 |
Consider the following special case:
0..N-1
Item
Item a[N]
Key
it = a[k]
insert
search
newSTab
count
get_ith
delete
... Key-Indexed Symbol Table | 21/75 |
Data representation:
Note: UNUSED
Item
Essentially a simple form of hashing (see later).
Exercise 2: Key-to-index Mapping | 22/75 |
Define a function which
lo == 12, hi == 27, a[16]
indexOf(12) == 0, indexOf(17) = 5,
indexOf(21) == 9, indexOf(27) == 15
Symbol Table Representations | 23/75 |
Symbol tables can be represented in many ways:
Costs (assuming N items):
Key-indexed arrays had "perfect" search performance O(1)
The ideal for key-indexed collections:
Almost as good:
In practice:
To use arbitrary values as keys, we need three things:
Generalised ADT for a collection of
Interface:
Implementation:
For hash tables, we make one change to interface:
Implementation:
Implement a HashLab which:
Hashing is a method for maintaining a collection
Hash table interface:
Implement a HashLab which:
Example hash table implementation:
Idealised versions of HashTable operations:
Points to note:
Basic idea behind hash function
If keys are
How to convert keys which are strings? (e.g.
Consider this potential hash function:
How does this function convert strings to
What are the deficiencies with this function and how can it be improved?
A slightly more sophisticated hash function
Converts strings into integers in table range.
But poor choice of
To use all of value in hash, with suitable "randomization":
This approach is known as universal hashing.
A real hash function (from PostgreSQL DBMS):
Where
i.e. scrambles all of the bits from the bytes of the key value
Enhanced concrete data representation:
Suggest suitable
Hash table initialisation:
Search function
Functions to maintain hash table:
In ideal scenarios, search cost in hash table is O(1).
Problems with hashing:
Write a function
which doubles the number of slots in a hash table
Three approaches to dealing with hash collisions:
Solve collisions by having multiple items per array entry.
Make each element the start of linked-list of Items.
Concrete data structure for hashing via chaining
Using the
Even without
Using sorted lists gives only small performance gain.
Other list operations are also simple:
Essentially: select a list; operate on that list.
Cost analysis:
Collision resolution by finding a new location for
Hashing is a method for maintaining a collection
Hash table interface:
Possible concrete data representation:
Assume:
Consider a hash table with N slots ...
When using chaining
Insert function for linear probing:
Search function for linear probing:
Search cost analysis:
Deletion slightly tricky for linear probing.
Need to ensure no
Delete function for linear probing:
Consider a linear-probed hash table
To simplify
A problem with linear probing: clusters
E.g. insert 5, 6, 15, 16, 7, 17, with hash = k%10
Double hashing improves on linear probing:
Concrete data structures for hashing via double hashing:
Search function for double hashing:
Insert function for double hashing:
Costs for double hashing:
Can be significantly better than linear probing
Collision resolution approaches:
Once M exceeds initial choice of N,
Produced: 9 Oct 2017
Type Search costs min max avg key-indexed 1 1* 1 sorted array 1 log2N log2N linked list 1 N N/2 BSTree 1 N* log2N
Hashing
Hashing 25/75
Hashing allows us to approximate this performance, but
... Hashing 26/75
courses["COMP3311"] = "Database Systems";
printf("%s\n", courses["COMP3311"]);
courses[h("COMP3311")] = "Database Systems";
printf("%s\n", courses[h("COMP3311")]);
item = {"COMP3311","Database Systems"};
courses = insert(courses, item);
printf("%s\n", search(courses, "COMP3311"));
... Hashing 27/75
Key
Item
Item
Key
Key
... Hashing 28/75 Item
typedef struct CollectionRep *Collection;
Collection newCollection();
typedef struct CollectionRep {
... Hashing 29/75
typedef struct HashTabRep *HashTable;
typedef struct HashTabRep {...Items[N]...} HashTabRep;
Exercise 3: Hash Lab 30/75
words
Hashing 31/75
Requires a function to map keys to indexes:
Item
(Item *)
Item *a[N];
hash
... Hashing 32/75
typedef struct HashTabRep *HashTable;
Exercise 4: Hash Lab 33/75
words
... Hashing 34/75
typedef struct HashTabRep {
int N;
... Hashing 35/75
Item *search(HashTable ht, Key k)
{
int i = hash(k, ht->N);
return ht->items[i];
}
void insert(HashTable ht, Item it)
{
int i = hash(key(it), ht->N);
ht->items[i] = newItem(it);
}
void delete(HashTable ht, Key k)
{
int i = hash(k, ht->N);
free(ht->items[i]);
ht->items[i] = NULL;
}
Hash Functions 36/75
Key
mod
(assumes that keys themselves are uniformly distributed)
... Hash Functions 37/75
int hash(Key key, int N)
{
int val = convert key to int;
return val % N;
}
int
"COMP1927"
"9300035"
Exercise 5: Hash Functions (i) 38/75
int hash(char *key, int N)
{
int h = 0; char *c;
for (c = key; *c != '\0'; c++)
h = h + *c;
return h % N;
}
int
... Hash Functions 39/75
int hash(char *key, int N)
{
int h = 0; char *c;
int a = 127;
a
... Hash Functions 40/75
int hash(char *key, int N)
{
int h = 0, a = 31415, b = 21783;
char *c;
for (c = key; *c != '\0'; c++) {
a = a*b % (N-1);
h = (a * h + *c) % N;
}
return h;
}
... Hash Functions 41/75
hash_any(unsigned char *k, register int keylen, int N)
{
register uint32 a, b, c, len;
... Hash Functions 42/75 mix
#define mix(a,b,c) \
{ \
a -= b; a -= c; a ^= (c>>13); \
b -= c; b -= a; b ^= (a<<8); \
c -= a; c -= b; c ^= (b>>13); \
a -= b; a -= c; a ^= (c>>12); \
b -= c; b -= a; b ^= (a<<16); \
c -= a; c -= b; c ^= (b>>5); \
a -= b; a -= c; a ^= (c>>3); \
b -= c; b -= a; b ^= (a<<10); \
c -= a; c -= b; c ^= (b>>15); \
}
Hash Table ADT 43/75
#include "Item.h"
Exercise 6: NoItem values 44/75 NoItem
items[]
(Item *)
... Hash Table ADT 45/75
NoItem
HashTable newHashTable(int N)
{
HashTabRep *new = malloc(sizeof(HashTabRep));
assert(new != NULL);
new->items = malloc(N*sizeof(Item));
assert(new->items != NULL);
for (int i = 0; i < N; i++)
new->items[i] = NoItem;
new->nitems = 0; new->nslots = N;
return new;
}
... Hash Table ADT 46/75
Item *search(HashTable ht, Key k) {
int i = hash(k, ht->nslots);
if (ht->items[i] == NoItem)
return NULL;
else if (key(ht->items[i]) != k)
return NULL;
else
return &(ht->items[i]);
}
... Hash Table ADT 47/75
void insert(HashTable ht, Item it) {
int i = hash(key(it), ht->nslots);
if (ht->items[i] == NoItem)
{ ht->items[i] = it; ht->nitems++; }
else if (key(ht->items[i] == key(it))
ht->items[i] = it;
Problems with Hashing 48/75
Item
nitems
nslots
Exercise 7: Expanding Hash Table 49/75
HashTable expand(HashTable ht) { ... }
Collision Resolution 50/75
Item
hash()
Separate Chaining 51/75
... Separate Chaining 52/75
typedef struct HashTabRep {
List *lists; // array of Lists of Items
int nslots; // # elements in array
int nitems; // # items stored in HashTable
} HashTabRep;
HashTable newHashTable(int N)
{
HashTabRep *new = malloc(sizeof(HashTabRep));
assert(new != NULL);
new->lists = malloc(N*sizeof(List));
assert(new->lists != NULL);
for (int i = 0; i < N; i++)
new->lists[i] = newList();
new->nslots = N; new->nitems = 0;
return new;
}
... Separate Chaining 53/75 List
#include "List.h"
Item *search(HashTable ht, Key k)
{
int i = hash(k, ht->nslots);
return ListSearch(ht->lists[i], k);
}
List
... Separate Chaining 54/75
#include "List.h"
void insert(HashTable ht, Item it) {
Key k = key(it);
int i = hash(k, ht->nslots);
ListInsert(ht->lists[i], it);
}
void delete(HashTable ht, Key k) {
int i = hash(k, ht->nslots);
ListDelete(ht->lists[i], k);
}
... Separate Chaining 55/75
Ratio of items/slots is called load α = M/N
Linear Probing 56/75 Item
Examples:
Hashing 57/75
Requires a function to map keys to indexes:
Item
(Item *)
Item *a[N];
hash
... Hashing 58/75
typedef struct HashTabRep *HashTable;
... Hashing 59/75
#define NoItem distinguished Item value
typedef struct HashTabRep {
Item *items; // array of Items
int nslots; // # elements in array
int nitems; // # items stored in array
} HashTabRep;
typedef HashTabRep *HashTable;
key(NoItem)
Key
... Hashing 60/75
When using a fixed-size array
Linear Probing 61/75
... Linear Probing 62/75
void insert(HashTable ht, Item it)
{
assert(ht->nitems < ht->nslots);
int N = ht->nslots;
Item *a = ht->items;
Key k = key(it);
int i, j, h = hash(k,N);
for (j = 0; j < N; j++) {
i = (h+j)%N;
if (a[i] == NoItem) break;
if (eq(k,key(a[i]))) break;
}
if (a[i] == NoItem) ht->nitems++;
a[i] = it;
}
... Linear Probing 63/75
Item *search(HashTable ht, Key k)
{
int N = ht->nslots;
Item *a = ht->items;
int i, j, h = hash(k,N);
for (j = 0; j < N; j++) {
i = (h+j)%N;
if (a[i] == NoItem) return NULL;
if (eq(k,key(a[i]))) return &(a[i]);
}
return NULL;
}
... Linear Probing 64/75
Example costs:
Item
load (α) 1/2 2/3 3/4 9/10 search hit 1.5 2.0 3.0 5.5 search miss 2.5 5.0 8.5 55.5
... Linear Probing 65/75 NoItem
(i.e. previously relocated items moved to appropriate location)
... Linear Probing 66/75
void delete(HashTable ht, Key k)
{
int N = ht->nslots;
Item *a = ht->items;
int i, j, h = hash(k,N);
for (j = 0; j < N; j++) {
i = (h+j)%N;
if (a[i] == NoItem) return;
Exercise 8: Linear Probing Example 67/75
Show the result of inserting items with these keys
into an initially empty table
1, 2, 3, 4, 5, 6, 7, 8, 9
15, 6, 20, 3, 17, 14, 33, 5
Exercise 9: Alternative Deletion Handling 68/75 NoItem
Give a data structure for this and re-implement functions.
... Linear Probing 69/75
Double Hashing 70/75
To generate relatively prime
(can be ensured by using an increment which is relatively prime to N)
hash2()
... Double Hashing 71/75
typedef struct HashTabRep {
Item *items;
... Double Hashing 72/75
Item *search(HashTable ht, Key k)
{
int N = ht->nslots;
Item *data = ht->items;
int i, j, h = hash(k,N);
int incr = hash2(k,ht->nhash2);
for (j = 0, i = h; j < N; j++) {
if (eq(k,key(data[i]) == 0)
return &(data[i]);
i = (i+incr)%N;
}
return NULL;
}
... Double Hashing 73/75
void insert(HashTable ht, Item it)
{
int N = ht->nslots;
Item *data = ht->items;
Key k = key(it);
int i, j, h = hash(k,N);
int incr = hash2(k,ht->nhash2);
for (j = 0; j < N; j += incr) {
ix = (i+j)%N;
if (cmp(k,key(data[ix]) == 0)
break;
else if (data[ix] == NoItem)
break;
}
assert(j != N);
... Double Hashing 74/75
load (α) 1/2 2/3 3/4 9/10 search hit 1.4 1.6 1.8 2.6 search miss 1.5 2.0 3.0 5.5
Hashing Summary 75/75
Only chaining allows α > 1, but performance degrades once α > 1
so changing array size potentially requires rebuiling whole table