Experiments
Measures for accuracy:
- Acc1 average top-10 entries in CurveIx top-10
- Acc2 how frequently CurveIx gives 10 out of 10
Measures for efficiency:
- Size size of Db file + Index file
- Dist number of distance calculations required
- IO total amount of I/O performed
To determine how these measures vary:
- built databases of size 5K, 10K, 15K, 20K (supersets)
- for each database, ran 25 query "benchmark" set
- for each query, ran for 3,5,10,20,30,40 curve-neighbours
(but because of curve-mapping problem, only got 20,30,40)
- for each query, ran for 20,40,60,80,100 curves
Also implemented a linear scan version for comparison and
to collect the exact answer sets.
|