next up previous contents
Next: Recurrent neural nets Up: Related work Previous: Related work

Hidden Markov models

  Hidden Markov models (HMMs) are a well-developed technology for classification of multivariate data that have been used extensively in speech recognition, handwriting recognition and even sign recognition. They have been applied for decades. Due to space considerations, an in-depth analysis of HMMs isn't possible. The interested reader may wish to have a look at [RJ86].

HMMs consist of states, possible transitions between states (and the probability of those transitions being taken) and a probability that in a particular state, a particular observation is made. An observation can be anything of interest. For example, it could be a frame in our definition, or it could be a symbol of some kind. HMMs are termed hidden because the state of the HMM can not, in general, be known by looking at the observations. They are Markov in the sense that the probability of observing an output depends only on the current state and not on previous states. By looking at the observations, using an algorithm known as the Viterbi Algorithm, an estimate of the probability that a particular instance or stream observed was generated by that HMM can be computed. We can have one HMM for every class we are interested in, then compute the probability of the particular stream we saw being generated by each of the HMMs, then choose the most probable one.

   figure574
Figure 2.1: A simple HMM where the only observations are j, k, or l

For example, consider figure 2.1. This is a very simple hidden Markov model, with three states. At each timestep, a j, k, or l is observed. We can work out the probability that the sequence [j,j,k,l] was generated by this HMM by considering the most probable state transitions that would generate this output, and computing the probability that it would occur (which is the probability of each output times the probability of the transitions). In this case, the most probable sequence of state transitions (given we start in state 1) is: [1,2,3,3]. The probability of this is tex2html_wrap_inline1667 . If we had other HMMs, we could compare the probability for each, and choose the HMM that produced the highest probability for the given observed sequence. If we had trained each HMM to have an appropriate set of parameters for a particular class, then by choosing the HMM which generated the observations with the highest probability, we would also know the class.

Note that observations can be very complex, and need not be as simple as a single letter j, k or l. In general, people who use HMMs model multivariate streams by representing each frame as an observation. The probability of a particular frame is estimated by using a Gaussian mixture over the channels.

The Baum-Welch reestimation algorithm provides a way for the probabilities of both transitions and of observations within states to be estimated from training data.

HMMs have proved to be very effective in practice, producing high levels of classification accuracy, but they fail to meet several of the goals discussed in section 1.1, namely:


next up previous contents
Next: Recurrent neural nets Up: Related work Previous: Related work

Mohammed Waleed Kadous
Tue Oct 6 13:04:40 EST 1998