One of the advantages of metafeatures is that they provide a mechanism for generation of human-readable descriptions of temporal processes. How then do we fit this into the TClass architecture?
Firstly, we can not produce human readable descriptions without making
some assumptions of the learner: it must produce a classifier that
expresses the learnt concept as bounds on attribute values. In other
words, each part of the concept is expressed as a test on a single
attribute value being less than, greater than or equal to a particular
value
. This is a fairly wide family of
learners and includes decision trees, decision lists, stumps and
ensembles of any of these.
We can postprocess the generated classifier by checking for any
temporal attributes
. We can
then use an attribute relabelling approach to create a comprehensible
description using the synthesised feature as a substitute for the
instantiated feature as discussed in Section 4.10.2.
This obviously requires the synthesised features. As discussed in that
section, we can also use the instantiated features to create useful
bounds.
Although we do produce a classifier intended for human understanding,
it is not the one used to classify. It is an approximation of
the learnt concept, rather than the learnt concept itself. Experts
exhibit similar behaviour ([Sha88] and
[CJ89]) - very few can express their expertise
completely enough for someone else to understand what they have learnt
totally; but rather they provide an approximation for what has been
learnt
.
To describe how attribute relabelling works, let the synthetic
features be denoted by the notation
, where
is a metafeature applied to the data and
is the
attribute value based on the relative membership of the
th
synthetic feature. If the classifier contains a test of the form
this means that the classifier checks for an
instantiated feature that has a relative membership for the
th
synthesised feature exceeding
. Conversely, if it has a predicate
of the form
it is checking the absence of such
an instantiated feature.
Attribute relabelling in this manner is easy to implement as a sequence of string substitutions on the output from the learner. Figure 5.14 shows the actual classifier produced by TClass on the Tech Support domain. Using this information, together with the information about the instantiated features, we can convert this to the form showed in Figure 5.15.
Figure 5.15 shows the post-processed form of Figure 5.14. For simplicity, the information on the bounds has been separated to a separate event index. The event index shows the range of bounds for the start and the duration of the LoudRun.