Selection Overview

Selection Overview

❖ Varieties of Selection

Selection: select * from R where C

We consider three distinct styles of selection:

Each style has several possible file-structures/techniques.

❖ Varieties of Selection (cont)

Selection returns a subset of tuples from a table

In the diagram, r_q = 8, b_q = 5

❖ Varieties of Selection (cont)

Different categories of selection queries:

one ... queries with at most 1 result ... 0 ≤ r_q ≤ 1, 0 ≤ b_q ≤ 1

pmr ... partial match retrieval ... 0 ≤ r_q ≤ r, 0 ≤ b_q ≤ b+b_ov

❖ Varieties of Selection (cont)

More categories of selection queries:

rng ... range queries ... 0 ≤ r_q ≤ r, 0 ≤ b_q ≤ b+b_ov

pat ... pattern-based queries ... 0 ≤ r_q ≤ r, 0 ≤ b_q ≤ b+b_ov

❖ Varieties of Selection (cont)

More categories of selection queries:

sim ... similarity matching ... in theory, r_q = r ... everything matches to some degree

uses "similarity" measure (0 ≤ sim ≤ 1, 0=different, 1=identical)
select * from Images where similar to SampleImage
results are ranked by sim value, from most to least similar
can become a filter via
- threshold ... only items where sim ≥ min similarity
- top-k ... k items with highest similarities

We focus on one, pmr and rng queries, but will discuss others

❖ Implementing Select Efficiently

Two basic approaches:

physical arrangement of tuples
- sorting (search strategy)
- hashing (static, dynamic, n-dimensional)
additional indexing information
- index files (primary, secondary, trees)
- signatures (superimposed, disjoint)

Our analysis assumes 1 input buffer available for each relation.

If more buffers are available, most methods benefit.