Hashing

COMP2521 20T2 ♢ Hashing [0/17]
❖ Associative Indexing


Regular array indexing is positional   ([0], [1], [2], ...):

An alternative approach to indexing:
COMP2521 20T2 ♢ Hashing [1/17]
❖ ... Associative Indexing

Difference between positional and associative indexing:

[Diagram:Pics/assoc-indexing.png]

COMP2521 20T2 ♢ Hashing [2/17]
❖ Hashing

Hashing allows us to get close to associative indexing

Ideally, we would like ...

courses["COMP3311"] = "Database Systems";
printf("%s\n", courses["COMP3311"]);

but C doesn't have non-integer index values.

Something almost as good:

courses[h("COMP3311")] = "Database Systems";
printf("%s\n", courses[h("COMP3311")]);

Hash function  h  converts key to integer and uses that as the index.

COMP2521 20T2 ♢ Hashing [3/17]
❖ ... Hashing


What the h (hash) function does

[Diagram:Pics/hashing-review.png]


Converts a key value (of any type) into an integer index.

Sounds good ... in practice, not so simple ...

COMP2521 20T2 ♢ Hashing [4/17]
❖ ... Hashing


Reminder: what we'd like ...

courses[h("COMP3311")] = "Database Systems";
printf("%s\n", courses[h("COMP3311")]);


In practice, we do something like ...

key = "COMP3311";
item = {"COMP3311","Database Systems",...};
courses = HashInsert(courses, key, item);
printf("%s\n", HashGet(courses, "COMP3311"));

COMP2521 20T2 ♢ Hashing [5/17]
❖ ... Hashing

To use arbitrary values as keys, we need ...

A problem: array is size N,  but  dom(Key) N

So, we also need a collision resolution method

COMP2521 20T2 ♢ Hashing [6/17]
❖ Hash Table ADT

Generalised ADT for a collection of Items

Interface:

typedef struct CollectionRep *Collection;

Collection newCollection();    // make new empty collection
Item *search(Collection, Key); // find item with key
void insert(Collection, Item); // add item into collection
void delete(Collection, Key);  // drop item with key

Implementation:

typedef struct CollectionRep {
   ... some data structure to hold multiple Items ...
} CollectionRep;

COMP2521 20T2 ♢ Hashing [7/17]
❖ ... Hash Table ADT

For hash tables, make one change to "standard" Collection interface:

typedef struct HashTabRep *HashTable;
// make new empty table of size N
HashTable newHashTable(int);
// add item into collection
void HashInsert(HashTable, Item);
// find item with key
Item *HashGet(HashTable, Key);
// drop item with key
void HashDelete(HashTable, Key);
// free memory of a HashTable
void dropHashTable(HashTable);

i.e. we specify max # elements that can be stored in the collection

COMP2521 20T2 ♢ Hashing [8/17]
❖ ... Hash Table ADT

Example hash table implementation:

typedef struct HashTabRep {
   Item **items; // array of (Item *)
   int  N;       // size of array
   int  nitems;  // # Items in array
} HashTabRep;

HashTable newHashTable(int N)
{
   HashTable new = malloc(sizeof(HashTabRep));
   new->items = malloc(N*sizeof(Item *));
   new->N = N;
   new->nitems = 0;
   for (int i = 0; i < N; i++) new->items[i] = NULL;
   return new;
}

COMP2521 20T2 ♢ Hashing [9/17]
❖ ... Hash Table ADT

Hash table implementation (cont)

void HashInsert(HashTable ht, Item it) {
   int h = hash(key(it), ht->N);
   // assume table slot empty!?
   ht->items[h] = copy(it);
   ht->nitems++;
}
Item *HashGet(HashTable ht, Key k) {
   int h = hash(k, ht->N);
   Item *itp = ht->items[h];
   if (itp != NULL && equal(key(*itp),k))
      return itp;
   else
      return NULL;
}

key() and copy() come from Item type;    equal() from Key type

COMP2521 20T2 ♢ Hashing [10/17]
❖ ... Hash Table ADT

Hash table implementation (cont)

void HashDelete(HashTable ht, Key k) {
   int h = hash(k, ht->N);
   Item *itp = ht->items[h];
   if (itp != NULL && equal(key(*itp),k)) {
      free(itp);
      ht->items[h] = NULL;
      ht->nitems--;
   }
}
void dropHashTable(HashTable ht) {
   for (int i = 0; i < ht->N; i++) {
      if (ht->items[i] != NULL) free(ht->items[i]);
   }
   free(ht);
}

COMP2521 20T2 ♢ Hashing [11/17]
❖ Hash Functions


Characteristics of hash functions:

COMP2521 20T2 ♢ Hashing [12/17]
❖ ... Hash Functions

Basic mechanism of hash functions:

int hash(Key key, int N)
{
   int val = convert key to 32-bit int;
   return val % N;
}

If keys are ints, conversion is easy (identity function)

How to convert keys which are strings?   (e.g. "COMP1927" or "John")

Definitely prefer that  hash("cat",N)  ≠  hash("dog",N)

Prefer that  hash("cat",N)  ≠  hash("act",N)  ≠  hash("tac",N)

COMP2521 20T2 ♢ Hashing [13/17]
❖ ... Hash Functions

Universal hashing uses entire key, with position info:

int hash(char *key, int N)
{
   int h = 0, a = 31415, b = 21783;
   char *c;
   for (c = key; *c != '\0'; c++) {
      a = a*b % (N-1);
      h = (a * h + *c) % N;
   }
   return h;
}

Has some similarities with RNG.    Aim: "spread" hash values over [0..N-1]

COMP2521 20T2 ♢ Hashing [14/17]
❖ ... Hash Functions

A real hash function (from PostgreSQL DBMS):

hash_any(unsigned char *k, register int keylen, int N)
{
   register uint32 a, b, c, len;
   // set up internal state
   len = keylen;
   a = b = 0x9e3779b9;
   c = 3923095;
   // handle most of the key, in 12-char chunks
   while (len >= 12) {
      a += (k[0] + (k[1] << 8) + (k[2] << 16) + (k[3] << 24));
      b += (k[4] + (k[5] << 8) + (k[6] << 16) + (k[7] << 24));
      c += (k[8] + (k[9] << 8) + (k[10] << 16) + (k[11] << 24));
      mix(a, b, c);
      k += 12; len -= 12;
   }
   // collect any data from remaining bytes into a,b,c
   mix(a, b, c);
   return c % N;
}

COMP2521 20T2 ♢ Hashing [15/17]
❖ ... Hash Functions

Where mix is defined as:

#define mix(a,b,c) \
{ \
  a -= b; a -= c; a ^= (c>>13); \
  b -= c; b -= a; b ^= (a<<8);  \
  c -= a; c -= b; c ^= (b>>13); \
  a -= b; a -= c; a ^= (c>>12); \
  b -= c; b -= a; b ^= (a<<16); \
  c -= a; c -= b; c ^= (b>>5);  \
  a -= b; a -= c; a ^= (c>>3);  \
  b -= c; b -= a; b ^= (a<<10); \
  c -= a; c -= b; c ^= (b>>15); \
}

i.e. scrambles all of the bits from the bytes of the key value

COMP2521 20T2 ♢ Hashing [16/17]
❖ Problems with Hashing

In ideal scenarios, search cost in hash table is O(1).

Problems with hashing:

COMP2521 20T2 ♢ Hashing [17/17]


Produced: 9 Jul 2020