next up previous contents
Next: Alternative approaches Up: Work on extending TClass Previous: Downsampling   Contents


Feature subset selection

Another way to improve the accuracy and comprehensibility that has been explored in machine learning is the use of feature subset selection. Sometimes, using all the features available has a detrimental effect on accuracy when compared to using some subset of the features. This is because some features are dependent on other features - and hence, there is no need to include the feature - or are noisy. Furthermore, the more features there are, the more likely that some feature will randomly fit the data and hence the probability of overfitting increases. Thus removing these features would lead to both improved accuracy, and clearer descriptions of learnt concepts, since there are less features that the user has to deal with. Furthermore, since we are generating many hundreds of features, many of which are likely to be irrelevant, we would expect to benefit from finding some optimal subset of features.

However, there is a problem: feature subset selection is a difficult problem. If there are $ n$ features, then there are $ 2^n$ possible feature subsets. Searching this space for a ``good'' set of features can be extremely time-consuming. There are two possible approaches to feature subset selection: the wrapper approach and the filter approach [JKP94]. In the wrapper approach, the learner is applied to subsets of features and tested on an hold-out set (or, if execution time is not an issue, using cross-validation). From the results of these tests, a good subset of features is selected. For example, for forward selection, a classifier is built for each feature individually; and the most accurate feature is ``accepted'' into the subset of good features. That feature is removed, and the process is repeated, adding each of the remaining features and evaluating its performance. The ``best'' set of two features is thus created. This proceeds incrementally until a feature set with maximal accuracy is achieved. Similarly, backward selection proceeds by eliminating one feature at a time, finding the least beneficial feature and eliminating it, and repeating the process, eliminating the least accurate features until eliminating further features decreases accuracy.

The other approach to feature subset selection is the ``filter'' approach, which looks at correlations between features and other performance measures to decide a priori, independent of any particular learner, what a good subset of features is. Although such techniques do not perform nearly as well as the wrapper approach, they are much faster. In general, if there are $ n$ features, the wrapper approach requires $ O(n^2)$ runs of the learner, whereas the filter approach requires only one run of the learner and some simple statistics on the dataset to be calculated.


next up previous contents
Next: Alternative approaches Up: Work on extending TClass Previous: Downsampling   Contents
Mohammed Waleed Kadous 2002-12-10