An older technique that been fallen out of favour since the advent
of hidden Markov models is dynamic time warping [MR81]. In
some senses, it is the temporal domain equivalent of instance-based
learning with a complex distance function. In dynamic time warping,
the problem is represented as finding the minimum distance between a
set of template streams and the input stream. The class chosen is the
``closest'' template. However, rather than using a straightforward
technique of comparing the value of the input stream at time t to
the template stream at time t, an algorithm is used that tries to
search the space of mappings from the time sequence of the input
stream to that of the template stream, so that the total distance is
minimised. This is not always a linear path; for example, we may find
that time
in the input stream corresponds to the time
in
the template stream, whereas
in the input stream corresponds to
in the template stream. The search space is constrained to
reasonable bounds, such as the mapping function from input time to
template time must be monotonically increasing (in other words, the
sequence of events between input and template is preserved).
Figure 2.5: Dynamic time warping
This is shown in figure 2.5, where the horizontal axis represents the time sequence of the input stream, and the vertical axis represents the time sequence of the template stream. The path shown results in the minimum distance between the input and template streams. The shaded in area represents the search space for the input time to template time mapping function.
Using dynamic programming techniques, the search for the minimum
distance path can be done in polynomial time. However, dynamic time
warping has some limitations. Whereas the calculations can be
accomplished in polynomial time, this polynomial is approximately
O(
) where N is the length of the sequence, and V is the
number of templates to be considered. This means that practical
applications may be quite slow. In addition, they require the
definition of a distance function between the values of the stream at
a given time frame. This may be easy if the stream is univariate, but
it is more difficult if the stream is, as in our case, a multivariate
one. Furthermore, it does not create any meaningful descriptions of
the data. In addition, creation of the template streams from data is
non-trivial and typically depends on the order of presentation of the
training instances, since if we have eight instances from which we are
trying to create a template, we have to warp them to each other in
pairs, and repeat the process until we arrive at a single template.