Sorted Files

COMP9315 21T1 ♢ Sorted Files ♢ [0/12]
❖ Sorted Files

Records stored in file in order of some field k (the sort key).

Makes searching more efficient; makes insertion less efficient

E.g. assume c = 4

[Diagram:Pics/file-struct/insert-in-sorted.png]

COMP9315 21T1 ♢ Sorted Files ♢ [1/12]
❖ Sorted Files (cont)

In order to mitigate insertion costs, use overflow pages.

[Diagram:Pics/file-struct/sfile1.png]

Total number of overflow pages = bov.

Average overflow chain length = Ov = bov / b.

Bucket = data page + its overflow page(s)

COMP9315 21T1 ♢ Sorted Files ♢ [2/12]
❖ Selection in Sorted Files

For one queries on sort key, use binary search.

// select * from R where k = val  (sorted on R.k)
lo = 0; hi = nPages(rel)-1
while (lo <= hi) {
    mid = (lo+hi) / 2;  // int division with truncation
    (tup,loVal,hiVal) = searchBucket(rel,mid,x,val);
    if (tup != NULL) return tup;
    else if (val < loVal) hi = mid - 1;
    else if (val > hiVal) lo = mid + 1;
    else return NOT_FOUND;
}
return NOT_FOUND;

where  rel is relation handle,  mid,lo,hi are page indexes,
                k is a field/attr,  val,loVal,hiVal are values for k

COMP9315 21T1 ♢ Sorted Files ♢ [3/12]
❖ Selection in Sorted Files (cont)

Search a page and its overflow chain for a key value

searchBucket(rel,p,k,val)
{
    get_page(rel,p,buf);
    (tup,min,max) = searchPage(buf,k,val,+INF,-INF)
    if (tup != NULL) return(tup,min,max);
    ovf = openOvFile(f);
    ovp = ovflow(buf);
    while (tup == NULL && ovp != NO_PAGE) {
        get_page(ovf,ovp,buf);
        (tup,min,max) = searchPage(buf,k,val,min,max)
        ovp = ovflow(buf);
    }     
    return (tup,min,max);
}

Assumes each page contains index of next page in Ov chain

COMP9315 21T1 ♢ Sorted Files ♢ [4/12]
❖ Selection in Sorted Files (cont)

Search within a page for key; also find min/max key values

searchPage(buf,k,val,min,max)
{
    res = NULL;
    for (i = 0; i < nTuples(buf); i++) {
        tup = get_tuple(buf, i);
        if (tup.k == val) res = tup;
        if (tup.k < min) min = tup.k;
        if (tup.k > max) max = tup.k;
    }
    return (res,min,max);
}

COMP9315 21T1 ♢ Sorted Files ♢ [5/12]
❖ Selection in Sorted Files (cont)

The above method treats each bucket like a single large page.

Cases:

Costone :     Best = 1      Worst = log2 b + bov

Average case cost analysis needs assumptions  (e.g. data distribution)

COMP9315 21T1 ♢ Sorted Files ♢ [6/12]
❖ Selection in Sorted Files (cont)

For pmr query, on non-unique attribute k, where file is sorted on k

E.g. select * from R where k = 2

[Diagram:Pics/file-struct/sfile-pmr.png]

Begin by locating a page p containing k=val   (as for one query).

Scan backwards and forwards from p to find matches.

Thus, Costpmr  =  Costone + (bq-1).(1+Ov)

COMP9315 21T1 ♢ Sorted Files ♢ [7/12]
❖ Selection in Sorted Files (cont)

For range queries on unique sort key (e.g. primary key):

E.g. select * from R where k >= 5 and k <= 13

[Diagram:Pics/file-struct/sfile-range-pk.png]

Costrange  =  Costone + (bq-1).(1+Ov)

COMP9315 21T1 ♢ Sorted Files ♢ [8/12]
❖ Selection in Sorted Files (cont)

For range queries on non-unique sort key, similar method to pmr.

E.g. select * from R where k >= 2 and k <= 6

[Diagram:Pics/file-struct/sfile-range.png]

Costrange = Costone + (bq-1).(1+Ov)

COMP9315 21T1 ♢ Sorted Files ♢ [9/12]
❖ Selection in Sorted Files (cont)

So far, have assumed query condition involves sort key k.

But what about   select * from R where j = 100.0 ?

If condition contains attribute j, not the sort key


Costone,   Costrange,   Costpmr   as for heap files

COMP9315 21T1 ♢ Sorted Files ♢ [10/12]
❖ Insertion into Sorted Files

Insertion approach:

Thus, Costinsert  =  Costone + δw         (where δw = 1 or 2)

Consider insertions of k=33, k=25, k=99 into:

[Diagram:Pics/file-struct/sorted-file1.png]

COMP9315 21T1 ♢ Sorted Files ♢ [11/12]
❖ Deletion from Sorted Files

E.g.   delete from R where k = 2

Deletion strategy:

Cost depends on selectivity of selection condition

Recall: selectivity determines bq   (# pages with matches)

Thus, Costdelete  =  Costselect + bqw

COMP9315 21T1 ♢ Sorted Files ♢ [12/12]


Produced: 7 Mar 2021