next up previous contents
Next: Comprehensibility Up: Artificial datasets Previous: Artificial datasets   Contents

Cylinder-Bell-Funnel - A warm-up

The first artificial domain we will be considering is a simple one, but one that has been used and explored by other researchers.

The artificial cylinder-bell-funnel task was originally proposed by Saito [Sai94], and further worked on by Manganaris [Man97]. The task is to classify a stream as one of three classes, cylinder ($ c$), bell ($ b$) or funnel ($ f$). Samples are generated as follows:


$\displaystyle c(t)$ $\displaystyle =$ $\displaystyle (6+\eta)\cdot\chi_{[a,b]}(t) + \epsilon(t)$  
$\displaystyle b(t)$ $\displaystyle =$ $\displaystyle (6+\eta)\cdot\chi_{[a,b]}(t)\cdot(t-a)/(b-a) +
\epsilon(t)$  
$\displaystyle f(t)$ $\displaystyle =$ $\displaystyle (6+\eta)\cdot\chi_{[a,b]}(t)\cdot(b-t)/(b-a) +
\epsilon(t)$  
$\displaystyle \chi_{[a,b]}$ $\displaystyle =$ $\displaystyle \left\{ \begin{array}{ll}
0 & t < a,\\
1 & a \leq t \leq b \\
0 & t > b \\
\end{array}\right.$  

Figure 6.1: Cylinder-bell-funnel examples.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{cylbellfunnel.eps}\par\centering\centering\end{center}\end{figure}

where $ \eta$ and $ \epsilon(t)$ are drawn from a standard normal distribution $ N(0,1)$, $ a$ is an integer drawn uniformly from $ [16,32]$ and $ b\!-\!a$ is an integer drawn uniformly from $ [32, 96]$. Figure 6.1 shows instances of each class. The cylinder class is characterised by a plateau from time $ a$ to $ b$, the bell class by a gradual increase from $ a$ to $ b$ followed by a sudden decline and the funnel class by a sudden increase at $ a$ and a gradual decrease until $ b$. Although univariate (i.e., only has one channel) and of a fixed length (128 frames), the CBF task attempts to characterise some of the typical properties of temporal domains. Firstly, there is random amplitude variation as a result of the $ \eta$ in the equation. Secondly, there is random noise (represented by the $ \epsilon(t)$). Thirdly, there is significant temporal variation in both the start of events (since $ a$ can vary from 16 to 32) and the duration of events (since $ b\!-\!a$ can vary from 32 to 96).

266 instances of each class were generated. For ease of comparison with previous results, 10-fold cross validation was used.

We have previously looked at the cylinder-bell-funnel domain [Kad99]. In that work, we compared an earlier version of TClass (see Appendix A) against naive segmentation and showed that we performed significantly better. Table 6.1 shows the table of results from that work.


Table 6.1: Previously published error rates on the CBF domain.


Features/Learner CBF
TClass/Naïve Bayes $ 3.67 \pm 0.61$
Segments/Naïve Bayes $ 6.20 \pm 0.67$
TClass/C4.5 $ \mathbf{1.90 \pm 0.57}$
Segment/C4.5 $ 2.41 \pm 0.48$


As an initial test, we ran all the learners, with and without smoothing. In all tables, the bounds represent the standard error of the mean (the standard deviation divided by the number of of folds).


Table 6.2: Error rates using the current version of TClass on the CBF domain.
Approach Unsmoothed Smoothed
TClass with J48 2.28 $ \pm$ 0.69 1.14 $ \pm$ 0.24
TClass with PART 4.56 $ \pm$ 0.82 1.77 $ \pm$ 0.36
TClass with IBL 56 $ \pm$ 1.24 42.78 $ \pm$ 1.78
TClass with Bagging/J48 1.9 $ \pm$ 0.48 1.27 $ \pm$ 0.43
TClass with AdaBoost/J48 1.39 $ \pm$ 0.33 1.01 $ \pm$ 0.24
TClass with Naive Bayes 3.16 $ \pm$ 0.70 2.78 $ \pm$ 0.61
Naive Segmentation 0 $ \pm$ 0 0 $ \pm$ 0
Hidden Markov Model 0 $ \pm$ 0 0.38 $ \pm$ 0.18


The above table shows the best of the performers for the naive segmentation and hidden Markov models. The complete results for naive segmentation and HMMs for unsmoothed classification are shown in Tables 6.3 and 6.4.


Table 6.3: Error rates for the naive segmentation algorithm on the CBF domain.
Approach Segments
  3 5 10 20
J48 $ 7.47 \pm 0.90$ $ 3.16 \pm 0.54$ $ 1.90 \pm 0.37$ $ 1.27 \pm 0.47$
PART $ 8.10 \pm 0.67$ $ 2.78 \pm 0.61$ $ 2.15 \pm 0.51$ $ 1.52 \pm 0.69$
IB1 $ 6.58 \pm 0.85$ $ 2.28 \pm 0.67$ $ \mathbf{0.00 \pm 0.00}$ $ \mathbf{0.00 \pm 0.00}$
AB $ 6.84 \pm 0.67$ $ 2.03 \pm 0.45$ $ 1.27 \pm 0.47$ $ 1.01 \pm 0.47$
Bag $ 6.84 \pm 0.78$ $ 2.28 \pm 0.61$ $ 1.90 \pm 0.45$ $ 1.27 \pm 0.51$



Table 6.4: Error rates of hidden Markov models on CBF domain.
Topology Raw Raw + Derivative
lr-3 $ 7.85 \pm 0.87$ $ 5.06 \pm 0.62$
lr-5 $ 0.25 \pm 0.16$ $ \mathbf{0.00 \pm 0.00}$
lr-10 $ 0.25 \pm 0.16$ $ \mathbf{0.00 \pm 0.00}$
lr-20 $ 0.00 \pm 0.00$ $ \mathbf{0.00 \pm 0.00}$
lrs1-3 $ 65.57 \pm 1.22$ $ 60.38 \pm 1.23$
lrs1-5 $ 0.51 \pm 0.27$ $ 32.15 \pm 1.32$
lrs1-10 $ 0.63 \pm 0.20$ $ \mathbf{0.00 \pm 0.00}$
lrs1-20 $ 0.13 \pm 0.12$ $ 0.13 \pm 0.12$
er-3 $ 36.20 \pm 1.43$ $ 5.82 \pm 0.82$
er-5 $ 36.71 \pm 1.23$ $ 36.20 \pm 1.53$
er-10 $ 36.71 \pm 1.23$ $ 36.20 \pm 1.53$
er-20 $ 36.71 \pm 1.23$ $ 36.20 \pm 1.53$


The results are very surprising. For example, one surprise is the performance of the baseline algorithms: nowhere else in the literature has their performance been compared. It's surprising therefore to see them perform amongst the best and they even do better than previously published results (including our own, which were previously the best).

It turns out that the dataset may be too simple to test the capabilities of our system. This makes it difficult to evaluate accuracy and other phenomena. Hence we need to construct a dataset that is (a) more difficult and (b) more typical of the domains that we are interested in.

Let us suggest why the two controls performed so well - they are capturing the natural aggregate nature of the data. There is only one real event in each of the three classes; and all of these events are of extended duration - at a bare minimum the one event characterises itself over 25 per cent of the data (32 frames) and possibly as much as 75 per cent of the data (96 frames). On average, the one event therefore occupies 50 per cent of the channel. Although the underlying signal is masked by the noise; the noise has a Gaussian distribution: exactly the type of noise that aggregate approaches (such as naive segmentation and HMMs) are designed to exclude. Further investigation reveal an interesting result in the HMMs: the self-transition probability for each state was almost always approximately 0.8. This means the hidden Markov model spends approximately the same time in each state. Some simple mathematics show that if $ a_{ii}=0.8$ we stay in each state for about 7 time steps, very close to the size of a segment used by the naive segmenter (6.25 timesteps). It seems that the HMM defaulting to a naive segmentation approach: equal size regions; with a mean and an average for each region functioning as a probabilistic model.

The above results might lead one to despair of the performance of TClass, since it doesn't appear to compete that well in terms of accuracy. However, if one's objective becomes the relentless pursuit of accuracy, then TClass can be used with voting to improve accuracy. The above experiment was therefore repeated with 10 repetitions of TClass with an AdaBoost backend, and the outcomes voted. 100 per cent accuracy was obtained. Indeed, 100 per cent 10-fold cross-validated accuracy was obtained with as few as 3 voters - the minimum number of possible voters.



Subsections
next up previous contents
Next: Comprehensibility Up: Artificial datasets Previous: Artificial datasets   Contents
Mohammed Waleed Kadous 2002-12-10