Sorting

COMP9315 21T1 ♢ Sorting ♢ [0/14]
❖ The Sort Operation

Sorting is explicit in queries only in the order by clause

select * from Students order by name;

Sorting is used internally in other operations:

Sort methods such as quicksort are designed for in-memory data.

For large data on disks, need external sorts such as merge sort.

COMP9315 21T1 ♢ Sorting ♢ [1/14]
❖ Two-way Merge Sort

Example:

[Diagram:Pics/scansortproj/two-way-ex2.png]

COMP9315 21T1 ♢ Sorting ♢ [2/14]
❖ Two-way Merge Sort (cont)

Requires at least three in-memory buffers:

[Diagram:Pics/scansortproj/two-way-buf.png]


Assumption: cost of Merge operation on two in-memory buffers ≅ 0.

COMP9315 21T1 ♢ Sorting ♢ [3/14]
❖ Comparison for Sorting

Above assumes that we have a function to compare tuples.

Needs to understand ordering on different data types.

Need a function tupCompare(r1,r2,f) (cf. C's strcmp)

int tupCompare(r1,r2,f)
{
   if (r1.f < r2.f) return -1;
   if (r1.f > r2.f) return 1;
   return 0;
}

Assume =, <, > are available for all attribute types.

COMP9315 21T1 ♢ Sorting ♢ [4/14]
❖ Comparison for Sorting (cont)

In reality, need to sort on multiple attributes and ASC/DESC, e.g.

-- example multi-attribute sort
select * from Students
order by age desc, year_enrolled

Sketch of multi-attribute sorting function

int tupCompare(r1,r2,criteria)
{
   foreach (f,ord) in criteria {
      if (ord == ASC) {
         if (r1.f < r2.f) return -1;
         if (r1.f > r2.f) return 1;
      }
      else {
         if (r1.f > r2.f) return -1;
         if (r1.f < r2.f) return 1;
      }
   }
   return 0;
}

COMP9315 21T1 ♢ Sorting ♢ [5/14]
❖ Cost of Two-way Merge Sort

For a file containing b data pages:

Gives total cost:   2.b.ceil(log2b)

Example: Relation with r=105 and c=50     b=2000 pages.

Number of passes for sort:   ceil(log22000)  =  11

Reads/writes entire file 11 times!    Can we do better?

COMP9315 21T1 ♢ Sorting ♢ [6/14]
❖ n-Way Merge Sort

Initial pass uses: B total buffers

[Diagram:Pics/scansortproj/n-way-buf-pass0.png]

Reads B pages at a time, sorts in memory, writes out in order

COMP9315 21T1 ♢ Sorting ♢ [7/14]
❖ n-Way Merge Sort (cont)

Merge passes use:   n = B-1 input buffers,   1 output buffer

[Diagram:Pics/scansortproj/n-way-buf.png]

COMP9315 21T1 ♢ Sorting ♢ [8/14]
❖ n-Way Merge Sort (cont)

Method:


// Produce B-page-long runs
for each group of B pages in Rel {
    read B pages into memory buffers
    sort group in memory
    write B pages out to Temp
}
// Merge runs until everything sorted
numberOfRuns = ceil(b/B)
while (numberOfRuns > 1) {
    // n-way merge, where n=B-1
    for each group of n runs in Temp {
        merge into a single run via input buffers
        write run to newTemp via output buffer
    }
    numberOfRuns = ceil(numberOfRuns/n)
    Temp = newTemp // swap input/output files
}

COMP9315 21T1 ♢ Sorting ♢ [9/14]
❖ Cost of n-Way Merge Sort

Consider file where b = 4096, B = 16 total buffers:

(cf. two-way merge sort which needs 11 passes)
COMP9315 21T1 ♢ Sorting ♢ [10/14]
❖ Cost of n-Way Merge Sort (cont)

Generalising from previous example ...

For b data pages and B buffers


Cost = 2.b.(1 + ceil(lognb0)),   where b0 and n are defined above
COMP9315 21T1 ♢ Sorting ♢ [11/14]
❖ Sorting in PostgreSQL

Sort uses a merge-sort (from Knuth) similar to above:

Tuples are mapped to SortTuple structs for sorting: If all data fits into memory, sort using qsort().

If memory fills while reading, form "runs" and do disk-based sort.

COMP9315 21T1 ♢ Sorting ♢ [12/14]
❖ Sorting in PostgreSQL (cont)

Disk-based sort has phases:

Described in terms of "tapes" ("tape" sorted run)

Implementation of "tapes": backend/utils/sort/logtape.c

COMP9315 21T1 ♢ Sorting ♢ [13/14]
❖ Sorting in PostgreSQL (cont)

Sorting comparison operators are obtained via catalog (in Type.o):

// gets pointer to function via pg_operator
struct Tuplesortstate { ... SortTupleComparator ... };

// returns negative, zero, positive
ApplySortComparator(Datum datum1, bool isnull1,
                    Datum datum2, bool isnull2,
                    SortSupport sort_helper);

Flags in SortSupport indicate: ascending/descending, nulls-first/last.

ApplySortComparator() is PostgreSQL's version of tupCompare()

COMP9315 21T1 ♢ Sorting ♢ [14/14]


Produced: 1 Mar 2021