next up previous contents
Next: Conclusions Up: Cylinder-Bell-Funnel - A warm-up Previous: Cylinder-Bell-Funnel - A warm-up   Contents

Comprehensibility

Are the results produced using TClass comprehensible? In this particular case, we can compare the induced to the real definitions. Output from J48 and PART, the two learners that produced comprehensible classifiers were examined. It is also compared against the definitions generated by naive segmentation.

For naive segmentation, almost identical trees were generated by different folds. An example of such a tree is shown in Figure 6.2. Examination of the trees reveals that the average value in the fifth segment (time 32 to 38) is important. this is a time guaranteed to be in the ``characteristic'' part of the signal, since the latest time the ``middle'' part (with either a plateau, increase or decrease) can begin is at time 32. Clearly, if this value is low, it can't be a cylinder or funnel, since they must be high during that period. The next point it looks at is the time period from time 51 to 57; a region that's guaranteed is soon enough in the period that either the cylinder is at its peak or the funnel has begun to reduce.

Figure 6.2: An instance of the trees produced by naive segmentation for the CBF domain.
\begin{figure}\footnotesize\begin{boxedverbatim}C_5 <= 2.86: bell (235.0)
C_5 ...
...yl (7.0)
\vert C_8 > 4.96: cyl (229.0)\end{boxedverbatim}\normalsize\end{figure}

Figure 6.3 shows an example with post-processing of the decision tree generated by TClass. Again, there is very clear meaning here. It appears that TClass can not discern the underlying increase in the data, but it nonetheless produces interesting results. The accuracies of the classifiers produced above and those produced below are about the same.

Can we read anything into the meaning of the tree? Certainly we can: The first thing it checks for is a local minimum with a high value the kind that can occur only if it's a cylinder at around time 48 (the bell and the funnel are either gradually increasing or decreasing by that time). It then checks for a local maximum very early on in the signal. This could only have come from a cylinder or a funnel. To tell these apart, it then looks for a sudden decrease (indicated by the high negative gradient of -0.5) around time 70. If it is present, this is indicative of a cylinder, whereas if it has no sudden drop, it must be a funnel. Otherwise, the classifier looks for a rapid increase at the beginning, indicated once again by the large gradient and the time early in the sequence. If it does not have this sudden increase, it's a bell; otherwise, depending once again on the sudden drop at the end, it's either a funnel or a cylinder.

As can be seen, the above definition, once translated into words, is quite comprehensible.

Figure 6.3: One decision tree produced by TClass on the CBF domain.
\begin{figure}\footnotesize\begin{boxedverbatim}IF c HAS LocalMin: time = 48.0...
...r of Leaves : 8Size of the tree : 15\end{boxedverbatim}\normalsize\end{figure}

Figure 6.4: Events used by the decision tree in Figure 6.3.
\begin{figure}\footnotesize\begin{boxedverbatim}Event index
-----------
*1: lm...
....23,0.51]
duration=18.0 r=[12.0,32.0]\end{boxedverbatim}\normalsize\end{figure}

We did the same thing with the PART rule learner. Figure 6.5 shows a typical ruleset created by TClass with PART. The first rule for instance checks for a high local minimum around time 44, which should only be possible if it is a cylinder or a funnel; and failing these, looks for a nice gentle decreasing signal that is characteristic of the funnel. If this is not present it's a cylinder. If it's not a cylinder, then it looks for a sudden increase early on in the signal (typical of cylinders or funnels) or a gradual decrease (typical of funnel). If it doesn't have either of these, then it's a bell. To finally ensure it's a funnel, it checks to see if there is either a middle time high maximum (more characteristic of a cylinder) or a middle time low maximum (only possible with a bell). If it has neither of these it must be a funnel. Between them, these three rules cover 88 per cent of instances. In the author's opinion, these rules are more comprehensible than either the naive segmentation approaches or the J48 TClass trees.

Figure 6.5: A ruleset produced by TClass using PART as the learner.
\begin{figure}\footnotesize\begin{boxedverbatim}IF c HAS LocalMin: time = 44.0...
....0): bell (3.0)Number of Rules : 6\end{boxedverbatim}\normalsize\end{figure}

Figure 6.6: Event index for the ruleset in Figure 6.5.
\begin{figure}\footnotesize\begin{boxedverbatim}Event index
-----------
*1: lm...
....0,53.0]
value=4.82 r=[2.60,5.58]\end{boxedverbatim}\normalsize\par\end{figure}


next up previous contents
Next: Conclusions Up: Cylinder-Bell-Funnel - A warm-up Previous: Cylinder-Bell-Funnel - A warm-up   Contents
Mohammed Waleed Kadous 2002-12-10