Sort-merge Join

COMP9315 21T1 ♢ Sort-merge Join ♢ [0/10]
❖ Sort-Merge Join

Basic approach:

Advantages: Disadvantages:
COMP9315 21T1 ♢ Sort-merge Join ♢ [1/10]
❖ Sort-Merge Join (cont)

Standard merging requires two cursors:

while (r != eof && s != eof) {
    if (r.val ≤ s.val) { output(r.val); next(r); }
    else { output(s.val); next(s); }
}
while (r != eof) { output(r.val); next(r); }
while (s != eof) { output(s.val); next(s); } 

[Diagram:Pics/join/sort-merge-uniq.png]

COMP9315 21T1 ♢ Sort-merge Join ♢ [2/10]
❖ Sort-Merge Join (cont)

Merging for join requires 3 cursors to scan sorted relations:

[Diagram:Pics/join/sort-merge.png]

COMP9315 21T1 ♢ Sort-merge Join ♢ [3/10]
❖ Sort-Merge Join (cont)

Algorithm using query iterators/scanners:

Query ri, si;  Tuple r,s;

ri = startScan("SortedR");
si = startScan("SortedS");
while ((r = nextTuple(ri)) != NULL
       && (s = nextTuple(si)) != NULL) {
    // align cursors to start of next common run
    while (r != NULL && r.i < s.j)
           r = nextTuple(ri);
    if (r == NULL) break;
    while (s != NULL && r.i > s.j)
           s = nextTuple(si);
    if (s == NULL) break;
    // must have (r.i == s.j) here
...

COMP9315 21T1 ♢ Sort-merge Join ♢ [4/10]
❖ Sort-Merge Join (cont)

...
    // remember start of current run in S
    TupleID startRun = scanCurrent(si)
    // scan common run, generating result tuples
    while (r != NULL && r.i == s.j) {
        while (s != NULL and s.j == r.i) {
            addTuple(outbuf, combine(r,s));
            if (isFull(outbuf)) {
                writePage(outf, outp++, outbuf);
                clearBuf(outbuf);
            }
            s = nextTuple(si);
        }
        r = nextTuple(ri);
        setScan(si, startRun);
    }
}

COMP9315 21T1 ♢ Sort-merge Join ♢ [5/10]
❖ Sort-Merge Join (cont)

Buffer requirements:

COMP9315 21T1 ♢ Sort-merge Join ♢ [6/10]
❖ Sort-Merge Join (cont)

Cost of sort-merge join.

Step 1: sort each relation   (if not already sorted):

Step 2: merge sorted relations:
COMP9315 21T1 ♢ Sort-merge Join ♢ [7/10]
❖ Sort-Merge Join on Example

SQL query on student/enrolment database:

select E.subj, S.name
from   Student S join Enrolled E on (S.id = E.stude)
order  by E.subj

And its relational algebra equivalent:

Sort[subj] ( Project[subj,name] ( Join[id=stude](Student,Enrolled) ) )

Database:  rS = 20000,  cS = 20,  bS = 1000,  rE = 80000,  cE = 40,  bE = 2000


We are interested only in the cost of Join,  with N  buffers

COMP9315 21T1 ♢ Sort-merge Join ♢ [8/10]
❖ Sort-Merge Join on Example (cont)

Case 1:   Join[id=stude](Student,Enrolled)

 
Cost = sort(S) + sort(E) + bS + bE
= 2bS(1+log31(bS/32)) + 2bE(1+log31(bE/32)) + bS + bE
= 2×1000×(1+2) + 2×2000×(1+2) + 1000 + 2000
= 6000 + 12000 + 1000 + 2000
= 21,000
COMP9315 21T1 ♢ Sort-merge Join ♢ [9/10]
❖ Sort-Merge Join on Example (cont)

Case 2:   Join[id=stude](Student,Enrolled)

For the above, no re-scans of E runs are ever needed

Cost  =  2,000 + 1,000  =  3,000   (regardless of which relation is outer)

COMP9315 21T1 ♢ Sort-merge Join ♢ [10/10]


Produced: 1 May 2021