Projection-based Methods
(C-curves)

Achieve efficiency by

  • projecting data space onto reduced dimensions

  • utilising established access methods to find objects in projection

In the case of C-curves [Megiddo and Shaft]

  • project d-dimensional vectors onto several 1-d space-filling curves

  • use efficient primary-key look-up (e.g. B-tree) to find neighbours

  • intersect neighbour sets from several curves to obtain candidates

[Diagram:pic/c-curves]

Question: how many curves do we need to ensure that we find all/most of the near neighbours?

Answer: depends on the number of "significant" dimensions (determined by e.g. PCA).