Banchong Harangsri, John Shepherd, Anne H.H. Ngu,
Eighth International Conference and Workshop on Database and Expert-systems Applications (DEXA'97), Toulouse, France, September 1997
(Compressed Postscript ... 41KB)
We first develop a theoretical foundation for systematic sampling which suggests that the method gives a more representative sample than the traditional simple random sampling. Subsequent experimental analysis on a range of synthetic relations confirms that the quality of sample relations (participating in a join) yielded by systematic sampling is higher than those produced by the traditional simple random sampling.
To ensure that the sample relations produced by the systematic sampling indeed assist in computation for more accurate join selectivities, we compare the systematic sampling with the most efficient simple random sampling called t_cross using a variety of star joins and a variety of relation configurations. The results demonstrate that with the same amount of sampling, the systematic sampling can provide considerably more accurate join selectivities than the t_cross sampling.
Keys: relational query optimisation, query size estimation, sampling