next up previous contents
Next: Practical implementation Up: Learners Previous: Learners   Contents


Voting and Ensembles

One of the most active areas of recent research in machine learning has been the use of ensemble classifiers. In an ensemble classifier, rather than a single classifier, there is a collection of classifiers, each of which are built by the learner given different parameters or data. Each classifier is run on the test instances and each classifier casts a ``vote''. The votes are then collated and the class with the greatest number of votes becomes the final classification.

Two approaches are popular:

  1. Bagging ([Bre96]): The training set is randomly sampled with replacement so that each learner gets a different input. The results are then voted.

  2. Boosting ([Sch99]): The learner is applied to the training set as usual, except that each instance has an associated weight. All instances start with equal weight. Any misclassified training instances are given greater weight. The process is repeated until (a) a maximum number of classifiers is constructed or (b) there are no incorrectly classified training instances. To make a classification, each of the individual classifiers is then applied[*], and the results are then voted.

Research indicates (in particular the excellent work in [BK99]) that bagging gives a reliable (i.e. is effective in many domains) but modest improvement in accuracy; whereas boosting produces a less reliable but greater improvement in accuracy (when it does work). The problem is that neither approach improves readability[*].

With TClass, a third way of building an ensemble is possible if synthetic feature construction is non-deterministic (e.g. if the RandSearch algorithm of Figure 4.12 is used). If this is the case, each TClass run produces different synthetic features on the same data and hence a different classifier is produced. The outcomes of each classifier can be voted. Indeed, because TClass is learner-independent, the two approaches can be combined. For instance, within each run of TClass we could employ AdaBoost with decision trees; and then vote each of the TClass classifiers as well.


next up previous contents
Next: Practical implementation Up: Learners Previous: Learners   Contents
Mohammed Waleed Kadous 2002-12-10