Projection-based Methods (C-curves)
Achieve efficiency by
- projecting data space onto reduced dimensions
- utilising established access methods to find objects in projection
In the case of C-curves
[Megiddo and Shaft]
- project d-dimensional vectors onto several 1-d space-filling curves
- use efficient primary-key look-up (e.g. B-tree) to find neighbours
- intersect neighbour sets from several curves to obtain candidates
Question: how many curves do we need to ensure that we find all/most
of the near neighbours?
Answer: depends on the number of "significant" dimensions (determined
by e.g. PCA).
|