Many other ML researchers have looked at issues closely related to temporal classification. These have included work on sequence prediction (e.g. Dietterich and Michalski's work with Eleusis [DM86]), temporal logics and their applications to recognising events, for example Kumar's work on temporal event conceptualisation [KM87] and the recent work in context detection and extraction for machine learning applications [Wid96,HHS98]. While some of these areas bear interesting relations to temporal classification, they differ in several regards. Sequence prediction is not about learning from labelled examples. The event conceptualisation work focuses on recognition of temporal events, but not learning the events themselves. The work on context detection is about selecting which static classifier to use on a dynamic basis; whereas temporal classification is about classifying the dynamics themselves.
There have been several other major works on issues related to temporal properties. Hau and Coiera's work on learning qualitative models of dynamic systems [HC93] is an illustrative example. Genmodel (the system developed by Coiera) might discover that the amount of blood the heart is able to pump out depends on the the volume of blood already in the heart, and when one increases so does the other. Such work has also been explored by Bratko in various projects, most notably KARDIO [IB89] and also by his students who have extended the approach to applications in behavioural cloning of humans in dynamic systems [SB99].
Syntactic representations of signals have also been explored. Fu [Fu82] looks at hand-constructing grammars for syntactic parsing (with error correction) of signals. In the syntactic pattern recognition approach, a time series is converted to a sequence of symbols or terminals. A grammar is then defined that allows or disallows particular sequences of terminals corresponding to each class.
Lee and Kim [LK95] take this syntactic approach further by adding an inductive step and applying it to financial markets. Based on knowledge of the financial markets, they develop a grammar for events. In our parlance, they are developing a fixed set of metafeatures for the problem. Two operators are allowed between events: CONC (concurrent events) and SEQ (sequential events). In some ways their work is similar to our own. Their use of terminals with parameters correspond closely to our concept of metafeatures (see Chapter 4). However, their technique does not appear to be generalisable. They are also not interested so much in classification as prediction of financial time series.
Some research has also been conducted into exactly how to detect sub-events of time series - in other words how to build metafeature extractors. Horn [Hor93] developed RDR-C4: a system for both manually building ripple-down rules and inducing them from data. Horn also allowed rules to be applied to time series including options such as no change, increasing, decreasing, step up, step down and peaks. Although the rule editor was developed it is unclear whether the detection algorithm was implemented or not, and if so, how it was done. He cites work by Love and Simaan [LS98] and the closely related work of Anderson et al [ACH$^+$89] which develops techniques for extracting the following events from univariate time series data: peaks, steps and ramps. Using these three metafeatures, the authors manually develop a system for extracting these features using filters, and subsequently manually build a ruleset to classify different events that can occur based on such observations. The authors used their system for recognising ``out-of-condition'' detection in industrial systems rather than classification.
Shi and Shimizu [SS92] built a neuro-fuzzy controller for yeast production. They discretised both the time and concentration, and then had a temporal sliding window go through the data. Each time-concentration pair was associated with a single neuron, and this was fed into a standard backpropagation network.
Increasingly, this area has become a popular research topic. For example, a workshop held at AAAI '98 [Dan98], while focusing on temporal prediction, also contained several papers on learning from time series. For example, Keogh and Pazzani [KP98] looks at automated ways of clustering time series from ECG signals and Shuttle information, by using a piecewise model combined with segmentation and agglomerative clustering. In Oates et al. [OJC98], a system is applied to extracting patterns from network failures, by looking at all possible sequences of events and keeping tabs on the frequency of these events.
Shahar [SM95] suggests an expert system architecture for knowledge-based temporal abstraction and also suggests that this system could be used for learning, though he does not actually do so. He then applies the techniques to clinical domains. Paliouras [Pal97] discusses refinement of temporal parameters in an existing temporal expert system; but does not have a capacity for modifying the model itself. Manganaris [Man97] developed a system for supervised classification of univariate signals using piecewise polynomial modelling combined with a scale-space analysis technique (i.e. a technique that allows the system to cope with the problem that patterns occur at different temporal scales) and applies them to space shuttle data as well as an artificial dataset.
Mannila et al [MTV95] have also been looking at temporal classification problems, in particular applying it to network traffic analysis. In their model, streams are a sequence of time-labelled events rather than regularly sampled channels. Learning is accomplished by trying to find sequences of events and the relevant time constraints that fit the data.
Das et al [DLM$^+$98] also worked on ways of extracting rules from a single channel of data by trying to extract rules of the form ``A is followed by B'', where A and B are events in terms of changes in value.
Rosenstein and Cohen [RC98] used delay portraits
(a representation where the state at time
is compared to its
state at time
). These delay portraits are then clustered to
create new representations, but they tend to be
sensitive to variations in delay and speed.
Keogh and Pazzani, in addition to the previously mentioned work on dynamic time warping, have been working on time series classification techniques for use with large databases of time series [KCMP01], using a representation of time series as a tree structure, with the root being the mean of the time series, the two children being the mean of the first half and second half respectively and so on. This allows very fast computations to be done on time series.
Another interesting development is the application of dynamic Bayesian
networks to temporal classification tasks. While they are not
specifically designed for temporal classification (they are more
commonly used for prediction, or for estimating current state given
the previous state estimate), Zweig and Russell
[ZR98] have applied them to the task of speech
recognition. The main problem with using dynamic Bayesian network is
that while algorithms for learning the parameters of a Bayes net are
well-advanced, learning the structure of Bayes nets has proved more
difficult
. Friedman et al. [FMR98]
are developing techniques for learning the structure of dynamic Bayes
networks; it remains to be seen whether these techniques can be
applied in temporal classification domains.