next up previous contents
Next: Comprehensibility Up: Auslan Previous: Flock data   Contents

Experimental results

We began with the Nintendo data. TClass was used with all five basic metafeatures, as well as global feature extractors for means, maxima and minima of each channel. The results of the runs of our data with the default values are shown in Table 6.12. We have omitted the complete results for the segmentation approach and HMMs; including only the best accuracy from each family.


Table 6.12: Error rates for the Nintendo sign language data.
Approach Error
TClass with J48 $ 61.0 \pm 0.6$
TClass with PART $ 67.8 \pm 1.0$
TClass with IB1 $ 77.9 \pm 1.7$
TClass with Naive Bayes $ 50.5 \pm 1.3$
TClass with Bag $ 47.5 \pm 0.7$
TClass with AB $ 42.5 \pm 1.2$
Naive Segmentation 30.7 $ \pm$ 1.0
Hidden Markov Model 28.8 $ \pm$ 0.3


This looks very much like a disaster for our learners. However, it is actually a very hard domain: there is the noise level and the quality of the sensing. The gloves are very poor quality, and when similar samples are provided using high-quality gloves, the results are much much better - see the Flock section. Secondly, it has a number of characteristics that make it more difficult. There are many more classes than usual for machine learning tasks: 95 - an investigation of the UCI repository [MM98] reveals that only one learning task contains more classes than this. Furthermore, they occur with equal frequency. This means that a ``random guess'' algorithm - a generic baseline - would get an error rate of 99 per cent or so.

However, previous experiments have managed to perform well. In earlier work [Kad95], we hand-built feature extractors specifically designed for the domain that were able to achieve a 17.0 per cent error rate on this data set. Earlier work [Kad99] also achieved an accuracy of 24.0 per cent using an early version of TClass, that did not use the directed segmentation for generating prototypical features (the approach is discussed in greater depth in Section A.2). So the results are quite disappointing.

We suspected the same overfitting phenomena exhibited in the TTest dataset were appearing here. However, on further examination this was not case - there is not much room for overfitting. There are only 16 samples per sign used for learning for a situation with 95 classes, so it is unlikely that overfitting is the issue. Further experiments confirmed that using pruning did not improve accuracy and had little impact on tree size.

However, we did try a number of ways of increasing the performance of the classifiers. These were (i) smoothing the data (ii) sampling more random centroids (10000 instead of 1000) and (iii) using relative values of heights and time rather than absolute values.


Table 6.13: Error rates with different TClass parameters.
Learner Base Smooth Centroids Relative All Three
J48 $ 61.0 \pm 0.6$ $ 61.2 \pm 0.9$ $ 58.4 \pm 1.0$ $ 61.9 \pm 0.9$ $ \mathbf{55.4 \pm 1.1}$
PART $ 67.8 \pm 1.0$ $ 62.2 \pm 0.9$ $ 60.3 \pm 1.7$ $ 64.0 \pm 0.9$ $ \mathbf{59.2 \pm 1.9}$
IB1 $ 77.9 \pm 1.7$ $ 43.2 \pm 1.0$ $ 42.5 \pm 1.0$ $ 80.7 \pm 0.8$ $ \mathbf{38.7 \pm 1.0}$
Bag $ 47.5 \pm 0.7$ $ 43.0 \pm 0.4$ $ 41.4 \pm 0.8$ $ 47.2 \pm 0.5$ $ \mathbf{39.8 \pm 1.1}$
AB $ 42.5 \pm 1.2$ $ 41.2 \pm 0.6$ $ 37.4 \pm 0.9$ $ 44.0 \pm 1.0$ $ \mathbf{35.5 \pm 1.0}$


The results are very promising. Each of these processes improves the accuracy in this case. What is more, with the information-based learners, it actually results in a reduced tree size and/or rule size. Between the base case and the final case, tree size is reduced by 9 per cent for J48, while error rate is reduced by 11 per cent. Similarly for PART, the number of rules is reduced by 15 per cent and the error is reduced by 9 per cent. Since the remainder of the learning algorithm remains the same in both cases, this implies that this improvement is a result of higher quality features being extracted.

Finally, in order to get higher accuracy, we voted the best performing individual TClass learner - AdaBoost. The results are shown in Figure 6.26.

Figure 6.26: Voting TClass generated classifiers approaches the error of hand-selected features.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{oldsign-abvote.eps}\par\centering\centering\end{center}\end{figure}

We see that this improves results. With 4 voting TClass classifiers, we outperform all other learners. With 25 voting TClass learners, it appears that we are asymptotically approaching the accuracy of the hand-extracted features. Again, this comes at the expense of comprehensibility, but the results are still good.


next up previous contents
Next: Comprehensibility Up: Auslan Previous: Flock data   Contents
Mohammed Waleed Kadous 2002-12-10