One of the most active areas of recent research in machine learning has been the use of ensemble classifiers. In an ensemble classifier, rather than a single classifier, there is a collection of classifiers, each of which are built by the learner given different parameters or data. Each classifier is run on the test instances and each classifier casts a ``vote''. The votes are then collated and the class with the greatest number of votes becomes the final classification.
Two approaches are popular:
Research indicates (in particular the excellent work in
[BK99]) that bagging gives a reliable (i.e. is
effective in many domains) but modest improvement in accuracy; whereas
boosting produces a less reliable but greater improvement in accuracy
(when it does work). The problem is that neither approach improves
readability
.
With TClass, a third way of building an ensemble is possible if synthetic feature construction is non-deterministic (e.g. if the RandSearch algorithm of Figure 4.12 is used). If this is the case, each TClass run produces different synthetic features on the same data and hence a different classifier is produced. The outcomes of each classifier can be voted. Indeed, because TClass is learner-independent, the two approaches can be combined. For instance, within each run of TClass we could employ AdaBoost with decision trees; and then vote each of the TClass classifiers as well.