❖ Join Example |
SQL query on student/enrolment database:
select E.subj, S.name from Student S join Enrolled E on (S.id = E.stude) order by E.subj
And its relational algebra equivalent:
Sort[subj] ( Project[subj,name] ( Join[id=stude](Student,Enrolled) ) )
Database: rS = 20000, cS = 20, bS = 1000, rE = 80000, cE = 40, bE = 2000
We are interested only in the cost of Join, with N buffers
❖ Nested Loop Join |
Basic strategy (R.a ⨝ S.b):
Result = {} for each page i in R { pageR = getPage(R,i) for each page j in S { pageS = getPage(S,j) for each pair of tuples tR,tS from pageR,pageS { if (tR.a == tS.b) Result = Result ∪ (tR:tS) } } }
Needs input buffers for R and S, output buffer for "joined" tuples
Terminology: R is outer relation, S is inner relation
Cost = bR . bS ... ouch!
❖ Block Nested Loop Join |
Method (for N memory buffers):
(tR,tS)
❖ Block Nested Loop Join (cont) |
Best-case scenario: bR ≤ N-2
Typical-case scenario: bR > N-2
Cost = bR + bS . ceil(bR/N-2)
Note: always requires rR.rS checks of the join condition
❖ Cost on Example Query |
With N = 12 buffers, and S as outer and E as inner
❖ Block Nested Loop Join |
Why block nested loop join is actually useful in practice ...
Many queries have the form
select * from R join S on (R.i = S.j) where R.x = K
This would typically be evaluated as
Tmp = Sel[x=K](R) Res = Join[i=j](Tmp, S)
If Tmp
❖ Index Nested Loop Join |
A problem with nested-loop join:
Consider Join[i=j](R,S) :
for each tuple r in relation R { use index to select tuples from S where s.j = r.i for each selected tuple s from S { add (r,s) to result } }
❖ Index Nested Loop Join (cont) |
This method requires:
Typical SelS = 1-2 (hashing) .. bq (unclustered index)
Trade-off: rR.SelS vs bR.bS, where bR ≪ rR and SelS ≪ bS
Produced: 28 Mar 2021