next up previous contents
Next: Recent interest from the Up: Established techniques Previous: Recurrent Neural Networks   Contents

Dynamic Time Warping

An older technique that has fallen out of favour since the advent of HMMs is dynamic time warping [MR81] (DTW). It can be understood as the temporal domain equivalent of instance-based learning.

In DTW, one begins with a set of template streams, each labelled with a class. Given an unlabelled input stream, the minimum distance between the input stream and each template is computed, and the class is attributed to the nearest template.

The cleverness of dynamic time warping lies in the computation of the distance between input streams and templates. Rather than comparing the value of the input stream at time $ t$ to the template stream at time $ t$, an algorithm is used that searches the space of mappings from the time sequence of the input stream to that of the template stream, so that the total distance is minimised. This is not always a linear mapping; for example, we may find that time $ t_1$ in the input stream corresponds to the time $ t_1+5$ in the template stream, whereas $ t_2$ in the input stream corresponds to $ t_2-3$ in the template stream. The search space is constrained to reasonable bounds, such as the mapping function from input time to template time must be monotonically nondecreasing (in other words, the sequence of events between input and template is preserved).

Figure 3.4: Dynamic time warping.
\begin{figure}\begin{center}
\leavevmode \epsfxsize =5in \epsfbox{dtw.eps}\par\centering\centering\end{center}\end{figure}

This is shown in Figure 3.4, where the horizontal axis represents the time of the input stream, and the vertical axis represents the time sequence of the template stream. The path shown results in the minimum distance between the input and template streams. The shaded in area represents the search space for the input time to template time mapping function. Any monotonically nondecreasing path within the space is an alternative to be considered. Using dynamic programming techniques, the search for the minimum distance path can be done in polynomial time: $ O(N^2V)$ where $ N$ is the length of the sequence, and $ V$ is the number of templates to be considered.

There are several weaknesses of DTW. Firstly, it is still $ O(N^2V)$, which is not particularly fast. Secondly, a distance metric between frames must be defined, which may be difficult with different channels with distinct characteristics. Thirdly, it does not create any meaningful descriptions of the data. Fourthly, creation of the template streams from data is non-trivial - and typically is accomplished by pairwise warping of training instances. Alternatively, all observed instances are stored as templates, but this is incredibly slow. Fifthly, although there have been several attempts to extend dynamic time warping to strong TC tasks, these have largely proved unsuccessful, and indeed prompted the move away from dynamic time warping towards hidden Markov models.

DTW is relatively simple and has been used commercially for several industrial weak TC tasks, like isolated spoken digit recognition.

Recently, there has been a resurgence in interest in dynamic time warping. Keogh and Pazzani [KP99,KP01] improve the performance of DTW by representing a time series hierarchically. Oates et al [OSC00] also use dynamic time warping to cluster experiences of mobile robots from real world data.


next up previous contents
Next: Recent interest from the Up: Established techniques Previous: Recurrent Neural Networks   Contents
Mohammed Waleed Kadous 2002-12-10