COMP9315 21T1 ♢ Query Optimisation ♢ [0/14]
Query optimiser: RA expression → efficient evaluation plan
COMP9315 21T1 ♢ Query Optimisation ♢ [1/14]
❖ Query Optimisation (cont) | |
Query optimisation is a critical step in query evaluation.
The query optimiser
- takes relational algebra expression from SQL compiler
- produces sequence of RelOps to evaluate the expression
- query execution plan should provide efficient evaluation
"Optimisation" is a misnomer since query optimisers
- aim to find a good plan ... but maybe not optimal
Observed Query Time = Planning time + Evaluation time
COMP9315 21T1 ♢ Query Optimisation ♢ [2/14]
❖ Query Optimisation (cont) | |
Why do we not generate optimal query execution plans?
Finding an optimal query plan ...
- requires exhaustive search of a space of possible plans
- for each possible plan, need to estimate cost
(not cheap)
Even for relatively small query, search space is
very large.
Compromise:
- do limited search of query plan space
(guided by heuristics)
- quickly choose a reasonably efficient execution plan
COMP9315 21T1 ♢ Query Optimisation ♢ [3/14]
❖ Approaches to Optimisation | |
Three main classes of techniques developed:
- algebraic (equivalences, rewriting, heuristics)
- physical (execution costs, search-based)
- semantic (application properties, heuristics)
All driven by aim of minimising
(or at least reducing) "cost".
Real query optimisers use a combination of algrebraic+physical.
Semantic QO is good idea, but expensive/difficult to implement.
COMP9315 21T1 ♢ Query Optimisation ♢ [4/14]
❖ Approaches to Optimisation (cont) | |
Example of optimisation transformations:
For join, may also consider sort/merge join and hash join.
COMP9315 21T1 ♢ Query Optimisation ♢ [5/14]
❖ Cost-based Query Optimiser | |
Approximate algorithm for cost-based optimisation:
translate SQL query to RAexp
for enough transformations RA' of RAexp {
while (more choices for RelOps) {
Plan = {}; i = 0; cost = 0
for each node e of RA' {
ROp = select RelOp method for e
Plan = Plan ∪ ROp
cost += Cost(ROp)
}
if (cost < MinCost)
{ MinCost = cost; BestPlan = Plan }
}
}
Heuristics: push selections down, consider only left-deep join trees.
COMP9315 21T1 ♢ Query Optimisation ♢ [6/14]
❖ Cost Models and Analysis | |
The cost of evaluating a query is determined by:
- size of relations (database relations and temporary relations)
- access mechanisms (indexing, hashing, sorting, join algorithms)
- size/number of main memory buffers (and replacement strategy)
Analysis of costs involves
estimating:
- size of intermediate results
- number of disk reads/writes
COMP9315 21T1 ♢ Query Optimisation ♢ [7/14]
❖ Choosing Access Methods (RelOps) | |
Performed for each node in RA expression tree ...
Inputs:
- a single RA operation (σ, π, ⨝)
- information about file organisation, data distribution, ...
- list of operations available in the database engine
Output:
- specific DBMS operation to implement this RA operation
COMP9315 21T1 ♢ Query Optimisation ♢ [8/14]
❖ Choosing Access Methods (RelOps) (cont) | |
Example:
- RA operation: Sel[name='John' ∧ age>21](Student)
-
Student
relation has B-tree index on name
- database engine (obviously) has B-tree search method
giving
tmp[i] := BtreeSearch[name='John'](Student)
tmp[i+1] := LinearSearch[age>21](tmp[i])
Where possible, use pipelining to avoid storing tmp[i]
on disk.
COMP9315 21T1 ♢ Query Optimisation ♢ [9/14]
❖ Choosing Access Methods (RelOps) (cont) | |
Rules for choosing σ access methods:
- σA=c(R) and
R
has index on A
⇒ indexSearch[A=c](R)
- σA=c(R) and
R
is hashed on A
⇒ hashSearch[A=c](R)
- σA=c(R) and
R
is sorted on A
⇒ binarySearch[A=c](R)
- σA ≥ c(R) and
R
has clustered index on A
⇒ indexSearch[A=c](R)
then scan
- σA ≥ c(R) and
R
is hashed on A
⇒ linearSearch[A>=c](R)
COMP9315 21T1 ♢ Query Optimisation ♢ [10/14]
❖ Choosing Access Methods (RelOps) (cont) | |
Rules for choosing ⨝ access methods:
- R ⨝ S and
R
fits in memory buffers
⇒ bnlJoin(R,S)
- R ⨝ S and
S
fits in memory buffers
⇒ bnlJoin(S,R)
- R ⨝ S and
R
,S
sorted on join attr
⇒ smJoin(R,S)
- R ⨝ S and
R
has index on join attr
⇒ inlJoin(S,R)
- R ⨝ S and no indexes, no sorting
⇒
hashJoin(R,S)
(bnl
= block nested loop;
inl
= index nested loop;
sm
= sort merge)
COMP9315 21T1 ♢ Query Optimisation ♢ [11/14]
❖ PostgreSQL Query Optimization | |
Input: tree of Query
nodes returned by parser
Output: tree of Plan
nodes used by query executor
- wrapped in a
PlannedStmt
node containing state info
Intermediate data structures are trees of
Path
nodes
- a path tree represents one evaluation order for a query
All
Node
types are defined in
include/nodes/*.h
COMP9315 21T1 ♢ Query Optimisation ♢ [12/14]
❖ PostgreSQL Query Optimization (cont) | |
Query optimisation proceeds in two stages (after parsing)...
Rewriting:
- uses PostgreSQL's rule system
- query tree is expanded to include e.g. view definitions
Planning and optimisation:
- using cost-based analysis of generated paths
- via one of two different path generators
- chooses least-cost path from all those considered
Then produces a
Plan
tree from the selected path.
COMP9315 21T1 ♢ Query Optimisation ♢ [13/14]
❖ PostgreSQL Query Optimization (cont) | |
COMP9315 21T1 ♢ Query Optimisation ♢ [14/14]
Produced: 5 Apr 2021