❖ Sorted Files |
Records stored in file in order of some field k (the sort key).
Makes searching more efficient; makes insertion less efficient
E.g. assume c = 4
❖ Sorted Files (cont) |
In order to mitigate insertion costs, use overflow pages.
Total number of overflow pages = bov.
Average overflow chain length = Ov = bov / b.
Bucket = data page + its overflow page(s)
❖ Selection in Sorted Files |
For one queries on sort key, use binary search.
// select * from R where k = val (sorted on R.k) lo = 0; hi = nPages(rel)-1 while (lo <= hi) { mid = (lo+hi) / 2; // int division with truncation (tup,loVal,hiVal) = searchBucket(rel,mid,x,val); if (tup != NULL) return tup; else if (val < loVal) hi = mid - 1; else if (val > hiVal) lo = mid + 1; else return NOT_FOUND; } return NOT_FOUND;
where rel
mid,lo,hi
k
val,loVal,hiVal
k
❖ Selection in Sorted Files (cont) |
Search a page and its overflow chain for a key value
searchBucket(rel,p,k,val) { get_page(rel,p,buf); (tup,min,max) = searchPage(buf,k,val,+INF,-INF) if (tup != NULL) return(tup,min,max); ovf = openOvFile(f); ovp = ovflow(buf); while (tup == NULL && ovp != NO_PAGE) { get_page(ovf,ovp,buf); (tup,min,max) = searchPage(buf,k,val,min,max) ovp = ovflow(buf); } return (tup,min,max); }
Assumes each page contains index of next page in Ov chain
❖ Selection in Sorted Files (cont) |
Search within a page for key; also find min/max key values
searchPage(buf,k,val,min,max) { res = NULL; for (i = 0; i < nTuples(buf); i++) { tup = get_tuple(buf, i); if (tup.k == val) res = tup; if (tup.k < min) min = tup.k; if (tup.k > max) max = tup.k; } return (res,min,max); }
❖ Selection in Sorted Files (cont) |
The above method treats each bucket like a single large page.
Cases:
Costone : Best = 1 Worst = log2 b + bov
Average case cost analysis needs assumptions (e.g. data distribution)
❖ Selection in Sorted Files (cont) |
For pmr query, on non-unique attribute k, where file is sorted on k
select * from R where k = 2
Begin by locating a page p containing k=
Scan backwards and forwards from p to find matches.
Thus, Costpmr = Costone + (bq-1).(1+Ov)
❖ Selection in Sorted Files (cont) |
For range queries on unique sort key (e.g. primary key):
select * from R where k >= 5 and k <= 13
Costrange = Costone + (bq-1).(1+Ov)
❖ Selection in Sorted Files (cont) |
For range queries on non-unique sort key, similar method to pmr.
select * from R where k >= 2 and k <= 6
Costrange = Costone + (bq-1).(1+Ov)
❖ Selection in Sorted Files (cont) |
So far, have assumed query condition involves sort key k.
But what about select * from R where j = 100.0
If condition contains attribute j, not the sort key
Costone,
Costrange,
Costpmr
as for heap files
❖ Insertion into Sorted Files |
Insertion approach:
Consider insertions of k=33, k=25, k=99 into:
❖ Deletion from Sorted Files |
E.g. delete from R where k = 2
Deletion strategy:
Recall: selectivity determines bq (# pages with matches)
Thus, Costdelete = Costselect + bqw
Produced: 7 Mar 2021