As a basis for comparison, what were judged to be the most effective attributes from the investigation in the previous chapter were used. It is possible that at high levels of accuracy, when the more effective attributes are working quite well, the less effective attributes not just being performance-neutral, but degrading performance, since they were creating ``attribute noise''.
These were thus removed. This meant that in particular the synthesised histograms and time features were excluded because of their poor performance.
This is a very simplified approach, and it may be that the elements feature set are not orthogonal -- i.e. there is some overlap between the information they provide. If time permitted, a linear independence investigation could be undertaken to determine the relative accuracies of the attributes included here.
In fact, in general, we can optimise the feature set even more by modifying the similarity function of the IBL algorithm, or by creating addition weighted features for C4.5. Here we have chosen to include or not to include features. In general, you can in fact assign ``weights'' to attributes to give a better similarity function. Finding the optimal similarity function is not a trivial task, and searching algorithms have to be used.
The results of the tests on these smaller set of attributes are shown below:
Table 6.2: Results of learning algorithms on the most effective attributes combined
The above shows that in all of the IBL1 cases, the results are better than when the poorly discriminatory features are included, by somewhere between 1 and 5 per cent. The situation with C4.5 is more ambiguous, but at this point, C4.5 appears to be significantly worse than IBL1, by some 30 per cent. It does not appear to be a serious rival to IBL1.
This result suggests that further optimisations are possible and that using techniques such as forward selection, Gauss-Seidel hill-climbing, simulated annealing or any other searching techniques to obtain a more accurate weighting function, since some of the features are, as illustrated here, better discriminants than others and that we want to give the better discriminators a higher weighting.