An older technique that has fallen out of favour since the advent of HMMs is dynamic time warping [MR81] (DTW). It can be understood as the temporal domain equivalent of instance-based learning.
In DTW, one begins with a set of template streams, each labelled with a class. Given an unlabelled input stream, the minimum distance between the input stream and each template is computed, and the class is attributed to the nearest template.
The cleverness of dynamic time warping lies in the computation of the
distance between input streams and templates. Rather than comparing
the value of the input stream at time
to the template stream at
time
, an algorithm is used that searches the space of mappings
from the time sequence of the input stream to that of the template
stream, so that the total distance is minimised. This is not always a
linear mapping; for example, we may find that time
in the input
stream corresponds to the time
in the template stream, whereas
in the input stream corresponds to
in the template
stream. The search space is constrained to reasonable bounds, such as
the mapping function from input time to template time must be
monotonically nondecreasing (in other words, the sequence of events
between input and template is preserved).
This is shown in Figure 3.4, where the horizontal axis
represents the time of the input stream, and the vertical axis
represents the time sequence of the template stream. The path shown
results in the minimum distance between the input and template
streams. The shaded in area represents the search space for the input
time to template time mapping function. Any monotonically
nondecreasing path within the space is an alternative to be
considered. Using dynamic programming techniques, the search for the
minimum distance path can be done in polynomial time:
where
is the length of the sequence, and
is the number of templates
to be considered.
There are several weaknesses of DTW. Firstly, it is still
,
which is not particularly fast. Secondly, a distance metric between
frames must be defined, which may be difficult with different channels
with distinct characteristics. Thirdly, it does not create any
meaningful descriptions of the data. Fourthly, creation of the
template streams from data is non-trivial - and typically is
accomplished by pairwise warping of training instances.
Alternatively, all observed instances are stored as templates, but
this is incredibly slow. Fifthly, although there have been several
attempts to extend dynamic time warping to strong TC tasks, these have
largely proved unsuccessful, and indeed prompted the move away from
dynamic time warping towards hidden Markov models.
DTW is relatively simple and has been used commercially for several industrial weak TC tasks, like isolated spoken digit recognition.
Recently, there has been a resurgence in interest in dynamic time warping. Keogh and Pazzani [KP99,KP01] improve the performance of DTW by representing a time series hierarchically. Oates et al [OSC00] also use dynamic time warping to cluster experiences of mobile robots from real world data.