❖ Query Cost Estimation |
Without executing a plan, cannot always know its precise cost.
Thus, query optimisers estimate costs via:
rS | cardinality of relation S |
RS | avg size of tuple in relation S |
V(A,S) | # distinct values of attribute A |
min(A,S) | min value of attribute A |
max(A,S) | max value of attribute A |
❖ Estimating Projection Result Size |
Straightforward, since we know:
rout = | πa,b,..(T) | = | T | = rT (in SQL, because of bag semantics)
Rout = sizeof(a) + sizeof(b) + ... + tuple-overhead
Assume page size B, bout = ceil(rT / cout), where cout = floor(B/Rout)
If using select distinct
❖ Estimating Selection Result Size |
Selectivity = fraction of tuples expected to satisfy a condition.
Common assumption: attribute values uniformly distributed.
Example: Consider the query
select * from Parts where colour='Red'
If V(colour,Parts)=4, r=1000 ⇒ |σcolour=red(Parts)|=250
In general, | σA=c(R) | ≅ rR / V(A,R)
Heuristic used by PostgreSQL: | σA=c(R) | ≅ r/10
❖ Estimating Selection Result Size (cont) |
Estimating size of result for e.g.
select * from Enrolment where year > 2015;
Could estimate by using:
❖ Estimating Selection Result Size (cont) |
Estimating size of result for e.g.
select * from Enrolment where course <> 'COMP9315';
Could estimate by using:
Heuristic used by some systems: | σA<>c(R) | ≅ r
❖ Estimating Selection Result Size (cont) |
How to handle non-uniform attribute value distributions?
White
Red
Blue
Silver
Use histogram as basis for determining # selected tuples.
Disadvantage: cost of storing/maintaining histograms.
❖ Estimating Selection Result Size (cont) |
Summary: analysis relies on operation and data distribution:
E.g. select * from R where a = k;
Case 1: uniq(R.a) ⇒ 0 or 1 result
Case 2: rR tuples && size(dom(R.a)) = n ⇒ rR / n results
E.g. select * from R where a < k;
Case 1: k ≤ min(R.a) ⇒ 0 results
Case 2: k > max(R.a) ⇒ ≅ rR results
Case 3: size(dom(R.a)) = n ⇒ ? min(R.a) ... k ... max(R.a) ?
❖ Estimating Join Result Size |
Analysis relies on semantic knowledge about data/relations.
Consider equijoin on common attr: R ⨝a S
Case 1: values(R.a) ∩ values(S.a) = {} ⇒ size(R ⨝a S) = 0
Case 2: uniq(R.a) and uniq(S.a) ⇒ size(R ⨝a S) ≤ min(|R|, |S|)
Case 3: pkey(R.a) and fkey(S.a) ⇒ size(R ⨝a S) ≤ |S|
❖ Cost Estimation: Postscript |
Inaccurate cost estimation can lead to poor evaluation plans.
Above methods can (sometimes) give inaccurate estimates.
To get more accurate cost estimates:
Trade-off between optimiser performance and query performance.
Produced: 5 Apr 2021