Multi-dimensional Hashing

COMP9315 21T1 ♢ Multi-d Hashing ♢ [0/14]
❖ Hashing and pmr

For a pmr query like

select * from R where a1 = C1 and ... and an = Cn

Can be alleviated using multi-attribute hashing (mah) MA.hashing works in conjunction with any dynamic hashing scheme.
COMP9315 21T1 ♢ Multi-d Hashing ♢ [1/14]
❖ Hashing and pmr (cont)

Multi-attribute hashing parameters:

[Diagram:Pics/select/choice-vector.png]

COMP9315 21T1 ♢ Multi-d Hashing ♢ [2/14]
❖ MA.Hashing Example

Consider relation Deposit(branch,acctNo,name,amount)

Assume a small data file with 8 main data pages (plus overflows).

Hash parameters:    d=3     d1=1    d2=1    d3=1    d4=0

Note that we ignore the amount attribute (d4=0)

Assumes that nobody will want to ask queries like

select * from Deposit where amount=533

Choice vector is designed taking expected queries into account.

COMP9315 21T1 ♢ Multi-d Hashing ♢ [3/14]
❖ MA.Hashing Example (cont)

Choice vector:

[Diagram:Pics/select/choice-vector.png]

This choice vector tells us:

COMP9315 21T1 ♢ Multi-d Hashing ♢ [4/14]
❖ MA.Hashing Example (cont)

Consider the tuple:

branch acctNo name amount
Downtown 101 Johnston 512

Hash value (page address) is computed by:

[Diagram:Pics/select/mahins2.png]

COMP9315 21T1 ♢ Multi-d Hashing ♢ [5/14]
❖ MA.Hashing Hash Functions

Auxiliary definitions:

#define MaxHashSize 32
typedef unsigned int HashVal;

// extracts i'th bit from hash value
#define bit(i,h) (((h) & (1 << (i))) >> (i))

// choice vector elems
typedef struct { int attr, int bit } CVelem;
typedef CVelem ChoiceVec[MaxHashSize];

// hash function for individual attributes
HashVal hash_any(char *val) { ... }

COMP9315 21T1 ♢ Multi-d Hashing ♢ [6/14]
❖ MA.Hashing Hash Functions (cont)

Produce combined d-bit hash value for tuple t :

HashVal hash(Tuple t, ChoiceVec cv, int d)
{
    HashVal h[nAttr(t)+1];  // hash for each attr
    HashVal res = 0, oneBit;
    int     i, a, b;
    for (i = 1; i <= nAttr(t); i++)
        h[i] = hash_any(attrVal(t,i));
    for (i = 0; i < d; i++) {
        a = cv[i].attr;
        b = cv[i].bit;
        oneBit = bit(b, h[a]);
        res = res | (oneBit << i);
    }
    return res;
}

COMP9315 21T1 ♢ Multi-d Hashing ♢ [7/14]
❖ Queries with MA.Hashing

In a partial match query:

E.g.

select amount
from   Deposit
where  branch = 'Brighton' and name = 'Green'

for which we use the shorthand   (Brighton, ?, Green, ?)

COMP9315 21T1 ♢ Multi-d Hashing ♢ [8/14]
❖ Queries with MA.Hashing (cont)

Consider query: select amount from Deposit where name='Green'


[Diagram:Pics/select/mahquery2.png]


Matching tuples must be in pages: 100, 101, 110, 111.

COMP9315 21T1 ♢ Multi-d Hashing ♢ [9/14]
❖ MA.Hashing Query Algorithm

// Builds the partial hash value (e.g. 10*0*1)
// Treats query like tuple with some attr values missing

ns = 0;  // # unknown bits = # stars
for each attribute i in query Q {
    if (hasValue(Q,i)) {
        set d[i] bits in composite hash
            using choice vector and hash(Q,i)
    } else {
        set d[i] *'s in composite hash
            using choice vector
        ns += d[i]
    }
}
...

COMP9315 21T1 ♢ Multi-d Hashing ♢ [10/14]
❖ MA.Hashing Query Algorithm (cont)

...
// Use the partial hash to find candidate pages

r = openRelation("R",READ);
for (i = 0; i < 2ns; i++) {
    P = composite hash
    replace *'s in P
        using i and choice vector
    Buf = getPage(fileOf(r), P);
    for each tuple T in Buf {
        if (T satisfies pmr query)
            add T to results
    }
}

COMP9315 21T1 ♢ Multi-d Hashing ♢ [11/14]
❖ Query Cost for MA.Hashing

Multi-attribute hashing handles a range of query types, e.g.

select * from R where a=1
select * from R where d=2
select * from R where b=3 and c=4
select * from R where a=5 and b=6 and c=7

A relation with n attributes has 2n different query types.

Different query types have different costs   (different no. of *'s)

Cost(Q) = 2s   where s = i ∉ Q di     (alternatively Cost(Q) = i ∉ Q 2di)

Query distribution gives probability pQ of asking each query type Q.

COMP9315 21T1 ♢ Multi-d Hashing ♢ [12/14]
❖ Query Cost for MA.Hashing (cont)

Min query cost occurs when all attributes are used in query

Min Costpmr  =  1

Max query cost occurs when no attributes are specified

Max Costpmr  =  2d  =  b

Average cost is given by weighted sum over all query types:

Avg Costpmr  =  ∑Q pQi ∉ Q 2di

Aim to minimise the weighted average query cost over possible query types

COMP9315 21T1 ♢ Multi-d Hashing ♢ [13/14]
❖ Optimising MA.Hashing Cost

For a given application, useful to minimise Costpmr.

Can be achieved by choosing appropriate values for di   (cv)

Heuristics:

Trade-off: making Qj more efficient makes Qk less efficient.

This is a combinatorial optimisation problem
(solve via standard optimisation techniques e.g. simulated annealing)

COMP9315 21T1 ♢ Multi-d Hashing ♢ [14/14]


Produced: 15 Mar 2021