Nested-loop Join

COMP9315 21T1 ♢ Nested-loop Join ♢ [0/8]
❖ Join Example

SQL query on student/enrolment database:

select E.subj, S.name
from   Student S join Enrolled E on (S.id = E.stude)
order  by E.subj

And its relational algebra equivalent:

Sort[subj] ( Project[subj,name] ( Join[id=stude](Student,Enrolled) ) )

Database:  rS = 20000,  cS = 20,  bS = 1000,  rE = 80000,  cE = 40,  bE = 2000


We are interested only in the cost of Join,  with N  buffers

COMP9315 21T1 ♢ Nested-loop Join ♢ [1/8]
❖ Nested Loop Join

Basic strategy (R.a ⨝ S.b):

Result = {}
for each page i in R {
   pageR = getPage(R,i)
   for each page j in S {
      pageS = getPage(S,j)
      for each pair of tuples tR,tS
                       from pageR,pageS {
         if (tR.a == tS.b)
            Result = Result ∪ (tR:tS)
}  }  }

Needs input buffers for R and S, output buffer for "joined" tuples

Terminology: R is outer relation, S is inner relation

Cost = bR . bS   ...   ouch!

COMP9315 21T1 ♢ Nested-loop Join ♢ [2/8]
❖ Block Nested Loop Join

Method (for N memory buffers):

[Diagram:Pics/join/blk-nested-loop.png]

COMP9315 21T1 ♢ Nested-loop Join ♢ [3/8]
❖ Block Nested Loop Join (cont)

Best-case scenario: bR ≤ N-2

Cost   =   bR + bS

Typical-case scenario: bR > N-2

Cost   =   bR + bS . ceil(bR/N-2)

Note: always requires rR.rS  checks of the join condition

COMP9315 21T1 ♢ Nested-loop Join ♢ [4/8]
❖ Cost on Example Query

With N = 12 buffers, and S  as outer and E  as inner

With N = 12 buffers, and E  as outer and S  as inner With N = 102 buffers, and S  as outer and E  as inner With N = 102 buffers, and E  as outer and S  as inner
COMP9315 21T1 ♢ Nested-loop Join ♢ [5/8]
❖ Block Nested Loop Join

Why block nested loop join is actually useful in practice ...

Many queries have the form

select *
from   R join S on (R.i = S.j)
where  R.x = K

This would typically be evaluated as

Tmp = Sel[x=K](R)
Res = Join[i=j](Tmp, S)

If Tmp is small may fit in memory (in small #buffers)

COMP9315 21T1 ♢ Nested-loop Join ♢ [6/8]
❖ Index Nested Loop Join

A problem with nested-loop join:

If there is an index on S, we can avoid such repeated scanning.

Consider Join[i=j](R,S) :

for each tuple r in relation R {
    use index to select tuples
        from S where s.j = r.i
    for each selected tuple s from S {
        add (r,s) to result
}   }

COMP9315 21T1 ♢ Nested-loop Join ♢ [7/8]
❖ Index Nested Loop Join (cont)

This method requires:

Cost   =   bR + rR.SelS    (SelS is the cost of performing a select on S).

Typical SelS  =  1-2 (hashing) .. bq (unclustered index)

Trade-off:   rR.SelS  vs  bR.bS,   where   bR ≪ rR and SelS ≪ bS

COMP9315 21T1 ♢ Nested-loop Join ♢ [8/8]


Produced: 28 Mar 2021