next up previous contents
Next: Comprehensibility Up: TTest - An artificial Previous: TTest   Contents

Experimental Results

As a starting dataset, we generated 1000 instances, using the following parameters: $ g=0.1, d=0.2, c=0.2, h=0.2$ with irrelevant features switched on. Figures 6.15, 6.16 and 6.17 instances of classes A, B and C respectively.

Figure 6.15: Examples of class A with default parameters.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{art2A.eps}\par\centering\centering\end{center}\end{figure}

Figure 6.16: Examples of class B with default parameters.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{art2B.eps}\par\centering\centering\end{center}\end{figure}

Figure 6.17: Examples of class C with default parameters.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{art2C.eps}\par\centering\centering\end{center}\end{figure}

The results are shown in Table 6.5.


Table 6.5: Error rates on the TTest domain.
Approach Error
TClass with J48 3.3 $ \pm$ 0.9
TClass with PART 2.3 $ \pm$ 0.3
TClass with IBL 68.1 $ \pm$ 1.5
TClass with Bagging/J48 2.5 $ \pm$ 0.4
TClass with AdaBoost/J48 1.0 $ \pm$ 0.3
TClass with Naive Bayes 9.8 $ \pm$ 1.5
Naive Segmentation 7.2 $ \pm$ 0.7
Hidden Markov Model 4.4 $ \pm$ 1.5


As we can see, for this particular training set, TClass performs better in terms of accuracy.


Table 6.6: Error rates of naive segmentation on the TTest domain.
Approach Segments
  3 5 10 20
J48 $ 28.7 \pm 0.9$ $ 34.1 \pm 1.3$ $ 23.4 \pm 0.9$ $ 7.4 \pm 1.0$
NB $ 28.2 \pm 1.3$ $ 31.1 \pm 1.3$ $ 25.9 \pm 1.2$ $ 16.4 \pm 1.3$
IB1 $ 34.0 \pm 1.5$ $ 33.2 \pm 1.3$ $ 31.6 \pm 1.0$ $ 32.6 \pm 1.2$
AB $ 30.5 \pm 1.5$ $ 29.7 \pm 1.3$ $ 26.3 \pm 1.0$ $ 7.8 \pm 1.2$
Bag $ 30.6 \pm 0.9$ $ 32.3 \pm 1.4$ $ 23.1 \pm 1.0$ $ \mathbf{7.2 \pm 0.7}$



Table 6.7: Error rates when using hidden Markov models on TTest domain.
Topology Raw Raw + Derivative
lr-3 $ 17.0 \pm 1.2$ $ 15.4 \pm 1.7$
lr-5 $ 13.4 \pm 2.1$ $ 25.3 \pm 0.9$
lr-10 $ 18.3 \pm 2.4$ $ 21.7 \pm 1.5$
lr-20 $ 21.9 \pm 1.3$ $ 19.9 \pm 1.6$
lrs1-3 $ 21.8 \pm 1.8$ $ 22.1 \pm 1.3$
lrs1-5 $ 29.8 \pm 1.9$ $ 23.5 \pm 1.2$
lrs1-10 $ 26.0 \pm 1.3$ $ 23.6 \pm 1.9$
lrs1-20 $ 24.8 \pm 1.7$ $ 24.1 \pm 1.8$
er-3 $ 26.0 \pm 1.7$ $ 15.9 \pm 1.2$
er-5 $ 12.5 \pm 1.0$ $ 4.8 \pm 0.4$
er-10 $ 12.5 \pm 1.0$ $ \mathbf{4.4 \pm 0.6}$
er-20 $ 12.5 \pm 1.0$ $ 4.8 \pm 0.4$


To examine the effect of noise, we increased $ g$ to 0.2; and also looked at what happens when $ g=0$. The results for more noise are shown in Table 6.8; and the results with zero noise are shown in Table 6.9. For brevity, the in-depth results for naive segmentation and HMMs are omitted.


Table 6.8: Error rates for high-noise situation (g=0.2) on TTest domain.
Approach Error
TClass with J48 17.4 $ \pm$ 2.1
TClass with PART 16.1 $ \pm$ 1.5
TClass with IBL 67.3 $ \pm$ 1.3
TClass with Bagging/J48 11.3 $ \pm$ 1.2
TClass with AdaBoost/J48 14.9 $ \pm$ 2.1
TClass with Naive Bayes 13.9 $ \pm$ 1.8
Naive Segmentation 13.2 $ \pm$ 1.6
Hidden Markov Model 9.9 $ \pm$ 1.1



Table 6.9: Error rates with no Gaussian noise (g=0) on TTest domain.
Approach Error
TClass with J48 $ 1.6 \pm 0.4$
TClass with PART $ 1.4 \pm 0.5$
TClass with IB1 $ 66.7 \pm 1.4$
TClass with Naive Bayes 11.2 $ \pm$ 2.3
TClass with Bag/J48 $ 1.6 \pm 0.4$
TClass with AdaBoost/J48 $ 0.9 \pm 0.3$
Naive Segmentation 2.1 $ \pm$ 0.5
Hidden Markov Model 4.8 $ \pm$ 0.9


We plotted the results for each of these in Figure 6.18, excluding TClass/IB1, because its performance is so poor relative to the other learners.

Figure 6.18: Learner accuracy and noise
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{ttest-noise.eps}\par\centering\centering\end{center}\end{figure}

Clearly, the noise level has a significant effect on the quality of the learning. With no noise, every TClass learner outperforms the baseline learners. With the noise level at 0.2, most of the learners perform poorly. The least poorly affected is the hidden Markov model; its statistical nature seems to helped it cope. Again, if we are willing to sacrifice readability, we can simply vote TClass. Figure 6.19 shows the effect of voting to improve accuracy, using bagging as the base learner. Bagging was chosen because it was the best performer in the unvoted results among the TClass learners, but it could be done with any of the other learners.

Figure 6.19: Voting different runs of TClass to reduce error with g=0.2.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{ttest-bagvote.eps}\par\centering\centering\end{center}\end{figure}

This shows the dramatic effect that voting can have on performance. In the limit, using voted TClass does approximately twice as well as the best-performing hidden Markov model: it appears to asymptotically reach an error of about 4.7 per cent, when doing voting of 11 bagged TClass learners[*]. Unfortunately, this comes at a price: readability. For the minimum error case, the human would have to look at the results of 110 trees and consider how each of them votes.

We were also curious as to why hidden Markov models and naive segmentation performed so well. Since HMMs are statistical models, they perform best when there are many instances - especially with the high level of noise when $ g = 0.2$. Given that in the above experiments, there are 300 training examples for each class, perhaps the TTest provides too many examples. We therefore generated a smaller dataset consisting of only 100 examples - hence having 30 examples per class for training.

We ran further experiments with this set of 100 examples. Abbreviated results are shown in Table 6.10. In order to get statistically significant results, the TClass runs were repeated 3 times and averaged (not voted).


Table 6.10: Error rates for high-noise situation (g=0.2), but with only 100 examples on TTest domain.
Approach Error
TClass with J48 $ 16.7 \pm 2.0$
TClass with PART $ 19.7 \pm 2.3$
TClass with IBL $ 69.2 \pm 2.0$
TClass with Bagging/J48 $ 12.7 \pm 2.3$
TClass with AdaBoost/J48 $ \mathbf{11.0 \pm 1.5}$
TClass with Naive Bayes 19.0 $ \pm$ 2.3
Naive Segmentation $ 21.0 \pm 3.0$
Hidden Markov Model $ 17.0 \pm 3.5$


Figure 6.20: Error rates of different learners with 100 and 1000 examples.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{ttest-1000vs100.eps}\par\centering\centering\end{center}\end{figure}

The results are surprising. Firstly, note that for the TClass learners, the performance of the bagging and boosting seems to have improved over when they had 300 examples. This suggests that the boosting and bagging systems are overfitting the data. To confirm this, we looked at the average number of leaves in each tree produced when bagging with 100 examples and 1000 examples. We found that with 100 examples, there were an average of 5.8 leaves per tree, whereas with 1000 examples there was an average of 37.0 leaves per tree. Theoretically, the three classes are orthogonal; in other words, we know they could, in the absence of noise, be classified with 3 leaf nodes. It does show, however, that the TClass learners perform better than the statistical approaches when the number of training instances in this domain is small.

This indicated that it was perhaps possible to get more accurate trees in high noise situations by setting the pruning parameters higher for the tree-based learners. We set the confidence for pruning to 5% instead of the default 25%; and set the minimum number of instances per node to 10, instead of the default 2. In the high noise case (g=0.2), this led to a massive reduction in tree size with an improvement in accuracy. For J48, the average tree size was reduced from 47.3 leaf nodes to 12.2 leaf nodes. At the same time, the error rate was reduced from 17.4 per cent to 13.2 per cent. This seems to be a strong gain; making the results both more comprehensible and at the same time more accurate.


next up previous contents
Next: Comprehensibility Up: TTest - An artificial Previous: TTest   Contents
Mohammed Waleed Kadous 2002-12-10