next up previous contents
Next: Per-class clustering Up: Early versions of TClass Previous: Early versions of TClass   Contents

Line-based segmentation

Most of our time series consist of continuous channels; and hence one obvious solution to converting a time series into a form more amenable to learning was to convert it into a sequence of lines. This idea is not novel: these ideas have been explored by both Pednault [Ped89] and Manganaris [Man97]. We contacted both authors for implementations of their algorithms for reuse, but received no replies. We therefore tried to reimplement their work using our own algorithm.

Approaches are typically based on minimum-message length (MML) [WB68] principles: Assume that you wish to transmit the time series over a communication channel, then what is the minimum number of bits required to transmit that information? The number of bits can be expressed as the number of bits required to send the model of the time series (e.g. a piecewise-linear model of the time series) plus the number of bits to transmit the deviations from the model (in other words, the error of the model). Hence all one has to do to perform line segmentation is generate all possible models, calculate the number of bits for the model, calculate the number of bits for the residuals and simply choose the model that has the lowest number of bits.

In practice, the above is slow to implement - it requires in general the consideration of $ O(f^3)$ models, where $ f$ is the average number of frames. Some optimisations are therefore necessary. Furthermore, assumptions have to be made on prior distributions of values in order to get correct approximations of such models.

Our implementation used a very simple algorithm with some very basic assumptions. The assumptions were that encoding a line segment was assumed to take a fixed number of bits; and encoding the residuals depended on the normalised distance from our model for each timestep. The algorithm was also simple: to begin with, the signal starts out with each pair of points being a line segment. The algorithm then evaluates the reduction in the total number of bits for the whole signal of fusing each pair of adjacent line segments into one. The pair that maximises the reduction in message length is fused. This process is repeated until there are no more line segment fusions that reduce the message length.

Figure A.1: Line-based segmentation example for y channel of sign come.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{come-y-approx.eps}\par\centering\centering\end{center}\end{figure}

Although incredibly simple, it works well. Figure A.1 shows a typical example. However, there were a few problems with this approach:

It is also interesting to note that line segmentation could easily be included (and is in fact included) in the TClass system: Each line segment corresponds to a metafeature. The main problem is the speed.

The problem at the time was: what to do with these line segments? Several things were attempted:

The clustering mentioned above was performed in parameter space. The parameter space for line segments was four-dimensional: midtime, gradient, duration and average.


next up previous contents
Next: Per-class clustering Up: Early versions of TClass Previous: Early versions of TClass   Contents
Mohammed Waleed Kadous 2002-12-10