next up previous contents
Next: Using learners for region Up: Improving directed segmentation Previous: Improving the directed segmentation   Contents

Improving by Direct Feedback from the Learner

So far, we have used a disparity measure to evaluate the ``goodness'' of a particular region segmentation. This disparity measure is really a low-computation proxy for the learner, built on the assumption that what has a high disparity measure will be useful to the learner. Realising this leads to an obvious conclusion: if the disparity measure is a proxy for the learner, then why not use the learner itself to evaluate the effectiveness of a region segmentation directly? This suggests that rather than our random search algorithm, we modify it so that it (a) the disparity measure is the cross-validated accuracy (hence, the higher the accuracy, the higher the disparity measure) (b) the most accurate subset of features is selected. This would lead to the improved algorithm shown in Figure 7.1. Here we have used the Attribute function defined in Figure 5.5. Also, we assume that a learner's accuracy can be evaluated using the method crossValAcc which takes a learner and a dataset and returns the cross-validated accuracy.

Figure 7.1: Random search algorithm, using the learner itself
\fbox{
\begin{minipage}{5in}
Inputs:\\
\hspace*{1cm}$I=[\langle [i_{11},...,i_{...
...m}End\\
$C$\ := $\mathit{bestCentroids}$\\
return $C$\ \\
End
\end{minipage}}

The algorithm in Figure 7.1 uses the learner to evaluate the real performance. Some difficulties arise, however: Firstly, the whole point of using the disparity measure was speed - it would be too slow to evaluate each region segmentation this way. For example, consider the new Auslan dataset. There are approximately 110 metafeatures. Let us assume we use 10-fold cross-validation with $ \mathit{numTrials} = 1000$. Then this would require us to run a learner $ 110 \times 10 \times 1000 = 1.1 \times 10^6$. Even assuming that each learner takes an average of 0.1 seconds to generate a classifier, then this would still take well over a day to execute.

Secondly, we are actually testing the performance of each metafeature in isolation. As pointed out in Section 5.2.2, the results of the execution of the random search algorithms are all combined to form a single feature vector. While a particular segmentation may work well if it is the only set of synthetic features, it may be of no use when it is combined with the segmentation of other regions. Similarly, it may be that a set of synthetic features are only useful in combination with another set. Hence, even using the learner itself is only an approximation of how useful it would be in the end. And if it is an approximation, then what it the guarantee that cross-validated accuracy using the learner is any better a disparity measure than chi-squared or the gain ratio?

This idea has certain similarities to the ``wrapper'' method suggested by John, Kohavi and Pfleger [JKP94] for handling the feature selection issue, and indeed we are using a kind of wrapper, albeit in a very different way.

But what about speed? This too can be addressed. Firstly, it may not be necessary to run the same number of trials as when using the disparity measure. Secondly, it may be possible to use a hybrid approach, taking advantage of both disparity measures and cross-validation accuracy. This could be done, for instance, by using a disparity measure to ``short list'' the region segmentations with the highest disparity measures (say the top 20) and then the learner's cross-validated accuracy is used to choose the best among these 20 instances.


next up previous contents
Next: Using learners for region Up: Improving directed segmentation Previous: Improving the directed segmentation   Contents
Mohammed Waleed Kadous 2002-12-10