Hash Collisions

COMP2521 20T2 ♢ Hash Collisions [0/23]
❖ Hashing: Reminder


Goal is to use keys as indexes, e.g.

courses["COMP3311"] = "Database Systems";
printf("%s\n", courses["COMP3311"]);

Since strings can't be indexes in C, use via a hash function, e.g.

courses[h("COMP3311")] = "Database Systems";
printf("%s\n", courses[h("COMP3311")]);

Hash function  h  converts keyinteger and uses that as the index.

Problem:   collisions,  where k ≠ j  but  hash(k,N) = hash(j,N)

COMP2521 20T2 ♢ Hash Collisions [1/23]
❖ Collision Resolution


Three approaches to dealing with hash collisions:

COMP2521 20T2 ♢ Hash Collisions [2/23]
❖ Separate Chaining

Solve collisions by having multiple items per array entry.

Make each element the start of linked-list of Items.

[Diagram:Pics/hash-linked.png]

All items in a given list have the same hash() value

COMP2521 20T2 ♢ Hash Collisions [3/23]
❖ ... Separate Chaining

Example of separate chaining ...

[Diagram:Pics/chaining.png]

COMP2521 20T2 ♢ Hash Collisions [4/23]
❖ ... Separate Chaining

Concrete data structure for hashing via chaining

typedef struct HashTabRep {
   List *lists; // array of Lists of Items
   int  N;      // # elements in array
   int  nitems; // # items stored in HashTable
} HashTabRep;

HashTable newHashTable(int N)
{
   HashTabRep *new = malloc(sizeof(HashTabRep));
   assert(new != NULL);
   new->lists = malloc(N*sizeof(List));
   assert(new->lists != NULL);
   for (int i = 0; i < N; i++)
      new->lists[i] = newList();
   new->N = N; new->nitems = 0;
   return new;
}
COMP2521 20T2 ♢ Hash Collisions [5/23]
❖ ... Separate Chaining


Using the List ADT, search becomes:

#include "List.h" 
Item *HashGet(HashTable ht, Key k)
{
   int i = hash(k, ht->N);
   return ListSearch(ht->lists[i], k);
}

Even without List abstraction, easy to implement.

Using sorted lists gives only small performance gain.

COMP2521 20T2 ♢ Hash Collisions [6/23]
❖ ... Separate Chaining

Other list operations are also simple:

#include "List.h"

void HashInsert(HashTable ht, Item it) {
   Key k = key(it);
   int i = hash(k, ht->N);
   ListInsert(ht->lists[i], it);
}
void HashDelete(HashTable ht, Key k) {
   int i = hash(k, ht->N);
   ListDelete(ht->lists[i], k);
}

Essentially: select a list; operate on that list.

COMP2521 20T2 ♢ Hash Collisions [7/23]
❖ ... Separate Chaining

Cost analysis:

Ratio of items/slots is called   load α = M/N
COMP2521 20T2 ♢ Hash Collisions [8/23]
❖ Linear Probing

Collision resolution by finding a new location for Item

Examples:

[Diagram:Pics/hash-linear.png]

COMP2521 20T2 ♢ Hash Collisions [9/23]
❖ ... Linear Probing

Concrete data structures for hashing via linear probing:

typedef struct HashTabRep {
   Item **items; // array of pointers to Items
   int  N;       // # elements in array
   int  nitems;  // # items stored in HashTable
} HashTabRep;

HashTable newHashTable(int N)
{
   HashTabRep *new = malloc(sizeof(HashTabRep));
   assert(new != NULL);
   new->items = malloc(N*sizeof(Item *));
   assert(new->items != NULL);
   for (int i = 0; i < N; i++) new->items[i] = NULL;
   new->N = N; new->nitems = 0;
   return new;
}

COMP2521 20T2 ♢ Hash Collisions [10/23]
❖ ... Linear Probing

Insert function for linear probing:

void HashInsert(HashTable ht, Item it)
{
   assert(ht->nitems < ht->N);
   int N = ht->N;
   Key k = key(it);
   Item **a = ht->items;
   int i = hash(k,N);
   for (int j = 0; j < N; j++) {
      if (a[i] == NULL) break;
      if (equal(k,key(*(a[i])))) break;
      i = (i+1) % N;
   }
   if (a[i] == NULL) ht->nitems++;
   if (a[i] != NULL) free(a[i]);
   a[i] = copy(it);
}

COMP2521 20T2 ♢ Hash Collisions [11/23]
❖ ... Linear Probing

Search function for linear probing:

Item *HashGet(HashTable ht, Key k)
{
   int N = ht->N;
   Item **a = ht->items;
   int i = hash(k,N);
   for (int j = 0; j < N; j++) {
      if (a[i] == NULL) break;
      if (equal(k,key(*(a[i]))))
         return a[i];
      i = (i+1) % N;
   }
   return NULL;
}

COMP2521 20T2 ♢ Hash Collisions [12/23]
❖ ... Linear Probing

Search cost analysis:

Example costs (assuming large table, e.g. N>100 ):
load (α)0.50  0.67  0.75  0.90
search hit1.52.03.05.5
search miss2.55.08.555.5
Assumes reasonably uniform data and good hash function.
COMP2521 20T2 ♢ Hash Collisions [13/23]
❖ ... Linear Probing

Deletion slightly tricky for linear probing.

Need to ensure no NULL in middle of "probe path"
(i.e. previously relocated items moved to appropriate location)

[Diagram:Pics/hash-probe-delete.png]

COMP2521 20T2 ♢ Hash Collisions [14/23]
❖ ... Linear Probing

Delete function for linear probing:

void HashDelete(HashTable ht, Key k)
{
   int N = ht->N;
   Item *a = ht->items;
   int i = hash(k,N);
   for (int j = 0; j < N; j++) {
      if (a[i] == NULL) return; // k not in table
      if (equal(k,key(*(a[i])))) break;
      i = (i+1) % N;
   }
   free(a[i]);  a[i] = NULL;  ht->nitems--;
   // clean up probe path
   i = (i+1) % N;
   while (a[i] != NULL) {
      Item it = *(a[i]);
      a[i] = NULL;  // remove 'it'
      ht->nitems--;
      HashInsert(ht, it); // insert 'it' again
      i = (i+1) % N;
   }
}

COMP2521 20T2 ♢ Hash Collisions [15/23]
❖ ... Linear Probing

Linear probing example:

[Diagram:Pics/probing.png]

COMP2521 20T2 ♢ Hash Collisions [16/23]
❖ ... Linear Probing

A problem with linear probing: clusters

E.g. insert  5,  6,  15,  16,  14,  25,  with hash(k) = k%10

[Diagram:Pics/clustering.png]

COMP2521 20T2 ♢ Hash Collisions [17/23]
❖ Double Hashing

Double hashing improves on linear probing:

To generate relatively prime
COMP2521 20T2 ♢ Hash Collisions [18/23]
❖ ... Double Hashing

Concrete data structures for hashing via double hashing:

typedef struct HashTabRep {
   Item **items; // array of pointers to Items
   int  N;       // # elements in array
   int  nitems;  // # items stored in HashTable
   int  nhash2;  // second hash mod
} HashTabRep;

#define hash2(k,N2) (((k)%N2)+1)

HashTable newHashTable(int N)
{
   HashTabRep *new = malloc(sizeof(HashTabRep));
   assert(new != NULL);
   new->items = malloc(N*sizeof(Item *));
   assert(new->items != NULL);
   for (int i = 0; i < N; i++)
      new->items[i] = NULL;
   new->N = N; new->nitems = 0;
   new->nhash2 = findSuitablePrime(N);
   return new;
}

COMP2521 20T2 ♢ Hash Collisions [19/23]
❖ ... Double Hashing

Search function for double hashing:

Item *HashGet(HashTable ht, Key k)
{
   Item **a = ht->items;
   int N = ht->N;
   int i = hash(k,N);
   int incr = hash2(k,ht->nhash2);
   for (int j = 0, j < N; j++) {
      if (a[i] == NULL) break;  // k not found
      if (equal(k,key(*(a[i]))) return a[i];
      i = (i+incr) % N;
   }
   return NULL;
}

COMP2521 20T2 ♢ Hash Collisions [20/23]
❖ ... Double Hashing

Insert function for double hashing:

void HashInsert(HashTable ht, Item it)
{
   assert(ht->nitems < ht->N); // table full
   Item **a = ht->items;
   Key k = key(it);
   int N = ht->N;
   int i = hash(k,N);
   int incr = hash2(k,ht->nhash2);
   for (int j = 0, j < N; j++) {
      if (a[i] == NULL) break;
      if (equal(k,key(*(a[i])))) break;
      i = (i+incr) % N;
   }
   if (a[i] == NULL) ht->nitems++;
   if (a[i] != NULL) free(a[i]);
   a[i] = copy(it);
}

COMP2521 20T2 ♢ Hash Collisions [21/23]
❖ ... Double Hashing

Search cost analysis:

Costs for double hashing (assuming large table, e.g. N>100 ):

load (α)0.5  0.67  0.75  0.90
search hit1.41.61.82.6
search miss1.52.03.05.5

Can be significantly better than linear probing

COMP2521 20T2 ♢ Hash Collisions [22/23]
❖ Hashing Summary

Collision resolution approaches:

Only chaining allows α > 1, but performance poor when α 1

For arrays, once M  exceeds initial choice of N,

COMP2521 20T2 ♢ Hash Collisions [23/23]


Produced: 10 Jul 2020