[prev] 68 [next]

Estimating Selection Result Size (cont)

How to handle non-uniform attribute value distributions?
  • collect statistics about the values stored in the attribute/relation
  • store these as e.g. a histogram in the meta-data for the relation
So, for part colour example, might have distribution like:

White: 35%   Red: 30%   Blue: 25%   Silver: 10%

Use histogram as basis for determining # selected tuples.

Disadvantage: cost of storing/maintaining histograms.