❖ Hashing and pmr |
For a pmr query like
select * from R where a1 = C1 and ... and an = Cn
❖ Hashing and pmr (cont) |
Multi-attribute hashing parameters:
❖ MA.Hashing Example |
Consider relation Deposit(branch,acctNo,name,amount)
Assume a small data file with 8 main data pages (plus overflows).
Hash parameters: d=3 d1=1 d2=1 d3=1 d4=0
Note that we ignore the amount
Assumes that nobody will want to ask queries like
select * from Deposit where amount=533
Choice vector is designed taking expected queries into account.
❖ MA.Hashing Example (cont) |
Choice vector:
This choice vector tells us:
❖ MA.Hashing Example (cont) |
Consider the tuple:
branch | acctNo | name | amount |
Downtown | 101 | Johnston | 512 |
Hash value (page address) is computed by:
❖ MA.Hashing Hash Functions |
Auxiliary definitions:
#define MaxHashSize 32 typedef unsigned int HashVal; // extracts i'th bit from hash value #define bit(i,h) (((h) & (1 << (i))) >> (i)) // choice vector elems typedef struct { int attr, int bit } CVelem; typedef CVelem ChoiceVec[MaxHashSize]; // hash function for individual attributes HashVal hash_any(char *val) { ... }
❖ MA.Hashing Hash Functions (cont) |
Produce combined d-bit hash value for tuple t :
HashVal hash(Tuple t, ChoiceVec cv, int d)
{
HashVal h[nAttr(t)+1]; // hash for each attr
HashVal res = 0, oneBit;
int i, a, b;
for (i = 1; i <= nAttr(t); i++)
h[i] = hash_any(attrVal(t,i));
for (i = 0; i < d; i++) {
a = cv[i].attr;
b = cv[i].bit;
oneBit = bit(b, h[a]);
res = res | (oneBit << i);
}
return res;
}
❖ Queries with MA.Hashing |
In a partial match query:
select amount from Deposit where branch = 'Brighton' and name = 'Green'
for which we use the shorthand (Brighton, ?, Green, ?)
❖ Queries with MA.Hashing (cont) |
Consider query: select amount from Deposit where name='Green'
Matching tuples must be in pages: 100
101
110
111
❖ MA.Hashing Query Algorithm |
// Builds the partial hash value (e.g. 10*0*1) // Treats query like tuple with some attr values missing ns = 0; // # unknown bits = # stars for each attribute i in query Q { if (hasValue(Q,i)) { set d[i] bits in composite hash using choice vector and hash(Q,i) } else { set d[i] *'s in composite hash using choice vector ns += d[i] } } ...
❖ MA.Hashing Query Algorithm (cont) |
... // Use the partial hash to find candidate pages r = openRelation("R",READ); for (i = 0; i < 2ns; i++) { P = composite hash replace *'s in P using i and choice vector Buf = getPage(fileOf(r), P); for each tuple T in Buf { if (T satisfies pmr query) add T to results } }
❖ Query Cost for MA.Hashing |
Multi-attribute hashing handles a range of query types, e.g.
select * from R where a=1 select * from R where d=2 select * from R where b=3 and c=4 select * from R where a=5 and b=6 and c=7
A relation with n attributes has 2n different query types.
Different query types have different costs
(different no. of *
Cost(Q) = 2s where s = ∑i ∉ Q di (alternatively Cost(Q) = ∏i ∉ Q 2di)
Query distribution gives probability pQ of asking each query type Q.
❖ Query Cost for MA.Hashing (cont) |
Min query cost occurs when all attributes are used in query
Min Costpmr = 1
Max query cost occurs when no attributes are specified
Max Costpmr = 2d = b
Average cost is given by weighted sum over all query types:
Avg Costpmr = ∑Q pQ ∏i ∉ Q 2di
Aim to minimise the weighted average query cost over possible query types
❖ Optimising MA.Hashing Cost |
For a given application, useful to minimise Costpmr.
Can be achieved by choosing appropriate values for di (cv)
Heuristics:
This is a combinatorial optimisation problem
(solve via standard optimisation techniques e.g. simulated annealing)
Produced: 15 Mar 2021