next up previous
Next: CHINESE HANDWRITING RECOGNITION Up: RESULTS Previous: RESULTS

TEMPORAL CLASSIFICATION TASKS

A system called TClass has been implemented which uses metafeatures for classification of multivariate time series. In addition to implementing all of the features discussed so far, it incorporates mechanisms for using many metafeatures concurrently, integrating of conventional attributes, and extracting characteristics of time series such as mean and range.

We implemented metafeatures for detecting intervals of increase and decrease: their parameter space is of the form (average, middletime, duration, gradient); plateau, which is much like increase or decrease, but without the gradient; and local maxima and minima: with the parameter space (height, time). The extraction functions were implemented with some robustness to noise, so that ``blips'' or noise that cause, for example, a momentary decrease in an interval of increase are ignored.

We tested TClass on a number of domains:

For comparison, we also applied two baseline learners:

For the back-end learning, the following learners were used: J48, Weka's [Witten & Frank, 1999] implementation of a C4.5 [Quinlan, 1993] style algorithm, PART, the Weka equivalent of c4.5rules, and bagging and boosting using J48 as the base learner. Also, voting of the boosted learner was performed, using repeated runs of synthetic feature construction until convergence. The error rates are shown in Table 4. It shows the mean error (percentage) for ten-fold2 cross-validation and the standard error of the mean. The first five use metafeatures, and the last two are baseline learners. Voting employed AdaBoost as the base learner. The $ \chi^2$ disparity measure was used for directed segmentation.


Table: Error rates on TClass domain.
Alg CBF TTest Auslan ECG
J48 $ 2.3 \pm 0.7$ $ 3.3 \pm 0.9$ $ 14.5 \pm 0.4$ $ 45.5\pm 1.7$
PART $ 4.6 \pm 0.8$ $ 2.3 \pm 0.3 $ $ 16.7 \pm 0.9$ $ 41.9 \pm 2.1$
Bag $ 1.9 \pm 0.5$ $ 2.5 \pm 0.4$ $ 9.4 \pm 0.8$ $ 35.1 \pm 2.6$
AB $ 1.4 \pm 0.3$ $ 1.0 \pm 0.3$ $ 6.4 \pm 0.4$ $ 32.9 \pm 2.4$
Vote $ \mathbf{0 \pm 0}$ $ \mathbf{0.5 \pm 0.2}$ $ \mathbf{2.1 \pm 0.2}$ $ \mathbf{28.0 \pm 1.8}$
Naive $ \mathbf{0 \pm 0}$ $ 7.2 \pm 0.7$ $ 5.5 \pm 0.5$ $ 28.5 \pm 2.6$
HMM $ \mathbf{0 \pm 0}$ $ 4.4 \pm 0.5$ $ 12.9 \pm 0.6$ $ 33.5 \pm 1.7$


The results in Table 4 are very promising, although there are some qualifications. Firstly, in every domain, TClass performs as well or better than other learners and the baseline learners - the Auslan and TTest domains are significantly better at the 99.5 per cent level3. However, partly disappointing is that the results with voting are significantly better than any other TClass method in two of the domains. Voted solutions are less readable, and hence this forces a tradeoff between readability and accuracy. The results for Auslan converge with 9 voters, and 11 voters for TTest.

The results on the ECG data are worthy of particular note, since chazal:phd uses the same dataset. However, the focus was the feature generation for a subsequent neural network stage. He obtained an error of $ 28.6\% \pm 2.4$ by hand-crafting a feature set for a neural network. Given that we were using only simple and generic metafeatures and not making use of the copious domain knowledge available on ECGs, this result is surprising. Furthermore, in a survey completed by willems:ecg, he found that on the same dataset a median human cardiologist obtained an error of 29.7%.

The results for the CBF domain are also surprising; firstly because all of TClass/Vote, naive segmentation and hidden Markov models attain 100 per cent accuracy. It turns out that dividing the signal into 20 segments and taking the mean of each segment makes the problem linearly separable with a high probability (depending on the Gaussian noise). This limitation of the the CBF domain is what led us to the TTest domain.

As for comprehensibility, for the Auslan domain, definitions generated by TClass compared favourably to the definitions found in the Auslan dictionary [Johnston, 1989]. Furthermore, ruleset sizes were reasonable for the Auslan domain of 1.14 rules per class using PART. In the ECG domain, a simple set of 24 rules was found that obtained 40.5 per cent error. Some of these rules showed close correlations with the rules used by existing expert systems [Schiller Medical, 1997]. With the TTest domain, it was able to reconstruct the generating concept exactly at low to medium noise levels. In general, in domains with many classes (such as Auslan), it was found that binarizing4 the learning problem led to more comprehensible definitions than trying to understand long rulesets. Most are incapable of understanding a decision tree that classifies signs into one of 95 classes, but they can understand three rules that can classify whether it was the sign thank or not.


next up previous
Next: CHINESE HANDWRITING RECOGNITION Up: RESULTS Previous: RESULTS
Mohammed Waleed Kadous 2002-02-12