Metafeatures have been applied to diverse domains that exhibit difficult properties: it has been tested on domains with up to 22 channels, 110 metafeatures, 200 megabyts of data, 100 classes, and highly skewed class distributions. They have been shown capable of producing high-accuracy classifiers; in fact, classifiers that match hand-crafted preprocessing techniques. Although the user must define the metafeatures, we have shown that a generic family of metafeatures work for temporal domains, and similarly one simple metafeature works for the Chinese character domain. Furthermore, they produce comprehensible descriptions. However, results show that it has trouble generating rules that are simultaneously accurate and comprehensible.
This suggests one avenue for future work. The marked difference between voted and unvoted results points to the weakness of the random search for a good segmentation. Several solutions are being explored, including: using a decision tree builder with heavy pruning to do the directed segmentation, putting the learner into the loop for selecting a good segmentation, using a hill-climbing approach and more.
There are several other issues to explore: automatic selection and
generation of metafeatures; and letting the class label be more complex
(for example, in temporal domains allowing
to be a sequence of
class labels). We also plan several practical steps: applying
metafeatures to new domains, for example robot vision; and
refining, documenting and releasing the TClass code.