As a starting dataset, we generated 1000 instances, using the following parameters:
with irrelevant features switched on.
Figures 6.15, 6.16 and
6.17 instances of classes A, B and C respectively.
The results are shown in Table 6.5.
As we can see, for this particular training set, TClass performs better in terms of accuracy.
|
To examine the effect of noise, we increased
to 0.2; and also
looked at what happens when
. The results for more noise are
shown in Table 6.8; and the results with zero
noise are shown in Table 6.9. For brevity, the
in-depth results for naive segmentation and HMMs are omitted.
We plotted the results for each of these in Figure 6.18, excluding TClass/IB1, because its performance is so poor relative to the other learners.
Clearly, the noise level has a significant effect on the quality of the learning. With no noise, every TClass learner outperforms the baseline learners. With the noise level at 0.2, most of the learners perform poorly. The least poorly affected is the hidden Markov model; its statistical nature seems to helped it cope. Again, if we are willing to sacrifice readability, we can simply vote TClass. Figure 6.19 shows the effect of voting to improve accuracy, using bagging as the base learner. Bagging was chosen because it was the best performer in the unvoted results among the TClass learners, but it could be done with any of the other learners.
This shows the dramatic effect that voting can have on performance. In
the limit, using voted TClass does approximately twice as well
as the best-performing hidden Markov model: it appears to
asymptotically reach an error of about 4.7 per cent, when doing voting
of 11 bagged TClass learners
. Unfortunately,
this comes at a price: readability. For the minimum error case, the
human would have to look at the results of 110 trees and consider how
each of them votes.
We were also curious as to why hidden Markov models and naive
segmentation performed so well. Since HMMs are statistical models,
they perform best when there are many instances - especially with the
high level of noise when
. Given that in the above
experiments, there are 300 training examples for each class, perhaps
the TTest provides too many examples. We therefore generated a smaller
dataset consisting of only 100 examples - hence having 30 examples
per class for training.
We ran further experiments with this set of 100 examples. Abbreviated results are shown in Table 6.10. In order to get statistically significant results, the TClass runs were repeated 3 times and averaged (not voted).
|
The results are surprising. Firstly, note that for the TClass learners, the performance of the bagging and boosting seems to have improved over when they had 300 examples. This suggests that the boosting and bagging systems are overfitting the data. To confirm this, we looked at the average number of leaves in each tree produced when bagging with 100 examples and 1000 examples. We found that with 100 examples, there were an average of 5.8 leaves per tree, whereas with 1000 examples there was an average of 37.0 leaves per tree. Theoretically, the three classes are orthogonal; in other words, we know they could, in the absence of noise, be classified with 3 leaf nodes. It does show, however, that the TClass learners perform better than the statistical approaches when the number of training instances in this domain is small.
This indicated that it was perhaps possible to get more accurate trees in high noise situations by setting the pruning parameters higher for the tree-based learners. We set the confidence for pruning to 5% instead of the default 25%; and set the minimum number of instances per node to 10, instead of the default 2. In the high noise case (g=0.2), this led to a massive reduction in tree size with an improvement in accuracy. For J48, the average tree size was reduced from 47.3 leaf nodes to 12.2 leaf nodes. At the same time, the error rate was reduced from 17.4 per cent to 13.2 per cent. This seems to be a strong gain; making the results both more comprehensible and at the same time more accurate.