COMP9315 21T1 ♢ Selection ♢ [0/6]
Selection: select * from R where C
- filters a subset of tuples from one relation
R
- based on a condition
C
on the attribute values
We consider three distinct styles of selection:
- 1-d (one dimensional) (condition uses only 1 attribute)
- n-d (multi-dimensional) (condition uses >1 attribute)
- similarity (approximate matching, with ranking)
Each style has several possible file-structures/techniques.
COMP9315 21T1 ♢ Selection ♢ [1/6]
❖ Varieties of Selection (cont) | |
Selection returns a subset of tuples from a table
- rq = number of tuples that match query q
- bq = number of pages containing tuples that match query q
In the diagram, rq = 8, bq = 5
COMP9315 21T1 ♢ Selection ♢ [2/6]
❖ Varieties of Selection (cont) | |
Different categories of selection queries:
one ... queries with at most 1 result ...
0 ≤ rq ≤ 1, 0 ≤ bq ≤ 1
- typically, equality test on primary key attribute, e.g.
-
select * from R where id = 1234
pmr ... partial match retrieval ...
0 ≤ r
q ≤
r, 0 ≤ b
q ≤ b+b
ov
- conjunction of equality tests on multiple attributes, e.g.
-
select * from R where age=65
(1-d)
-
select * from R where age=65 and gender='m'
(n-d)
COMP9315 21T1 ♢ Selection ♢ [3/6]
❖ Varieties of Selection (cont) | |
More categories of selection queries:
rng ... range queries ...
0 ≤ rq ≤ r, 0 ≤ bq ≤ b+bov
- conjunction of inequalities, on one or more attributes, e.g.
-
select * from R where age≥18 and age≤21
(1-d)
-
select * from R where 18≤age≤21 and 160≤height≤190
(n-d)
pat ... pattern-based queries ...
0 ≤ r
q ≤
r, 0 ≤ b
q ≤ b+b
ov
- string-based matching using
like
or regular expressions
-
select * from R where name like '%oo%'
-
select * from R where name ~ '^Smi'
COMP9315 21T1 ♢ Selection ♢ [4/6]
❖ Varieties of Selection (cont) | |
More categories of selection queries:
sim ... similarity matching ... in theory, rq = r
... everything matches to some degree
- uses "similarity" measure (0 ≤ sim ≤ 1, 0=different, 1=identical)
-
select * from
Images where similar to
SampleImage
- results are ranked by sim value, from most to least similar
- can become a filter via
- threshold ... only items where sim ≥ min similarity
- top-k ... k items with highest similarities
We focus on
one,
pmr and
rng queries, but will discuss others
COMP9315 21T1 ♢ Selection ♢ [5/6]
❖ Implementing Select Efficiently | |
Two basic approaches:
- physical arrangement of tuples
- sorting
(search strategy)
- hashing
(static, dynamic, n-dimensional)
- additional indexing information
- index files
(primary, secondary, trees)
- signatures
(superimposed, disjoint)
Our analysis assumes 1 input buffer available for each relation.
If more buffers are available, most methods benefit.
COMP9315 21T1 ♢ Selection ♢ [6/6]
Produced: 6 Mar 2021