next up previous contents
Next: Temporal and spatial Analysis Up: Practical implementation Previous: Developing metafeatures   Contents


Producing human-readable output in TClass

One of the advantages of metafeatures is that they provide a mechanism for generation of human-readable descriptions of temporal processes. How then do we fit this into the TClass architecture?

Firstly, we can not produce human readable descriptions without making some assumptions of the learner: it must produce a classifier that expresses the learnt concept as bounds on attribute values. In other words, each part of the concept is expressed as a test on a single attribute value being less than, greater than or equal to a particular value[*]. This is a fairly wide family of learners and includes decision trees, decision lists, stumps and ensembles of any of these.

We can postprocess the generated classifier by checking for any temporal attributes[*]. We can then use an attribute relabelling approach to create a comprehensible description using the synthesised feature as a substitute for the instantiated feature as discussed in Section 4.10.2. This obviously requires the synthesised features. As discussed in that section, we can also use the instantiated features to create useful bounds.

Although we do produce a classifier intended for human understanding, it is not the one used to classify. It is an approximation of the learnt concept, rather than the learnt concept itself. Experts exhibit similar behaviour ([Sha88] and [CJ89]) - very few can express their expertise completely enough for someone else to understand what they have learnt totally; but rather they provide an approximation for what has been learnt[*].

To describe how attribute relabelling works, let the synthetic features be denoted by the notation $ \mathit{met}_n$, where $ \mathit{met}$ is a metafeature applied to the data and $ n$ is the attribute value based on the relative membership of the $ n$th synthetic feature. If the classifier contains a test of the form $ \mathit{met}_n > c$ this means that the classifier checks for an instantiated feature that has a relative membership for the $ n$th synthesised feature exceeding $ c$. Conversely, if it has a predicate of the form $ \mathit{met}_n \le c$ it is checking the absence of such an instantiated feature.

Attribute relabelling in this manner is easy to implement as a sequence of string substitutions on the output from the learner. Figure 5.14 shows the actual classifier produced by TClass on the Tech Support domain. Using this information, together with the information about the instantiated features, we can convert this to the form showed in Figure 5.15.

Figure 5.14: Learnt classifier for Tech Support domain after running TClass on it.
\begin{figure}\footnotesize\begin{boxedverbatim}--- Cluster centroids ---
V-l...
...0): Angry (3.0)Number of Rules : 2\end{boxedverbatim}\normalsize\end{figure}

Figure 5.15: Post-processing Figure 5.14 to make it more readable.
\begin{figure}\footnotesize\begin{boxedverbatim}PART decision list
-----------...
...9.0 r=[9.0,12.0]
durn=3.0 r=[3.0,4.0]\end{boxedverbatim}\normalsize\end{figure}

Figure 5.15 shows the post-processed form of Figure 5.14. For simplicity, the information on the bounds has been separated to a separate event index. The event index shows the range of bounds for the start and the duration of the LoudRun.


next up previous contents
Next: Temporal and spatial Analysis Up: Practical implementation Previous: Developing metafeatures   Contents
Mohammed Waleed Kadous 2002-12-10