❖ Sort-Merge Join |
Basic approach:
(r,s)
❖ Sort-Merge Join (cont) |
Standard merging requires two cursors:
while (r != eof && s != eof) {
if (r.val ≤ s.val) { output(r.val); next(r); }
else { output(s.val); next(s); }
}
while (r != eof) { output(r.val); next(r); }
while (s != eof) { output(s.val); next(s); }
❖ Sort-Merge Join (cont) |
Merging for join requires 3 cursors to scan sorted relations:
r
s
ss
❖ Sort-Merge Join (cont) |
Algorithm using query iterators/scanners:
Query ri, si; Tuple r,s; ri = startScan("SortedR"); si = startScan("SortedS"); while ((r = nextTuple(ri)) != NULL && (s = nextTuple(si)) != NULL) { // align cursors to start of next common run while (r != NULL && r.i < s.j) r = nextTuple(ri); if (r == NULL) break; while (s != NULL && r.i > s.j) s = nextTuple(si); if (s == NULL) break; // must have (r.i == s.j) here ...
❖ Sort-Merge Join (cont) |
... // remember start of current run in S TupleID startRun = scanCurrent(si) // scan common run, generating result tuples while (r != NULL && r.i == s.j) { while (s != NULL and s.j == r.i) { addTuple(outbuf, combine(r,s)); if (isFull(outbuf)) { writePage(outf, outp++, outbuf); clearBuf(outbuf); } s = nextTuple(si); } r = nextTuple(ri); setScan(si, startRun); } }
❖ Sort-Merge Join (cont) |
Buffer requirements:
❖ Sort-Merge Join (cont) |
Cost of sort-merge join.
Step 1: sort each relation (if not already sorted):
❖ Sort-Merge Join on Example |
SQL query on student/enrolment database:
select E.subj, S.name from Student S join Enrolled E on (S.id = E.stude) order by E.subj
And its relational algebra equivalent:
Sort[subj] ( Project[subj,name] ( Join[id=stude](Student,Enrolled) ) )
Database: rS = 20000, cS = 20, bS = 1000, rE = 80000, cE = 40, bE = 2000
We are interested only in the cost of Join, with N buffers
❖ Sort-Merge Join on Example (cont) |
Case 1: Join[id=stude](Student,Enrolled)
Cost | = | sort(S) + sort(E) + bS + bE |
= | 2bS(1+log31(bS/32)) + 2bE(1+log31(bE/32)) + bS + bE | |
= | 2×1000×(1+2) + 2×2000×(1+2) + 1000 + 2000 | |
= | 6000 + 12000 + 1000 + 2000 | |
= | 21,000 |
❖ Sort-Merge Join on Example (cont) |
Case 2: Join[id=stude](Student,Enrolled)
Cost = 2,000 + 1,000 = 3,000 (regardless of which relation is outer)
Produced: 1 May 2021