Anyone acquainted with pattern recognition in any field will have come across histograms. Surprisingly however, their application to gesture recognition is, to the best of my knowledge, novel.
A histogram can be thought of as a discretised probability density function. Basically, you segment the range of possible values into subranges, and then count the number of instances of each subrange.
It is exactly this that we do with the signs and on a number of aspects of a sign. One obvious histogram to take is on the x, y and z positions of the hand.
You have to be a bit clever with how we do this, however, to get useful information. First, if signs are slightly more exaggerated, either in time or space, then we want it to remain reasonably invariant.
Thus we calculate the histograms in the following way: Let d be the
number of divisions we wish to divide the ranges into, and let
with
be the ``columns'' of the histogram. Assume we
are doing it for the x-position only. Then:

where

Effectively, this ``normalises'' by two things:
term, and every
must fall into exactly one
column, the net effect is that
.
above will not be
zero. This is desirable, since it essentially removes the issue of
size of a sign, and low resolution on small signs, with lots of
empty columns. The alternative would be to have absolute locations
which would be nowhere near as closely correlated with the
information in the sign itself.
To clarify what a histogram means, a 5-division histogram for the `same' (that we saw in figure 5.1) sign shown in figure 5.2.
Figure 5.2: An example of a histogram -- in this case of motion
in the x-dimension.
As you can see, there is a strong
presence in the middle -- this isn't a surprise, since this is where
the sign starts. Also, it is weak to the right, since this seems to be
a few points resulting from ``swinging back'' too far on the
return. However, on the left hand side, there is another strong peak,
which is caused by the slowing down of the motion that occurs in the
sign
.
In this set of features, we consider the histograms of the x-position, the y-position and the z-position. Histograms of other data will follow.
There is one question which cannot be answered by theory alone. This question is: What is the optimum number of divisions d that will result in the smallest error? The answer depends on a number of factors, such as the accuracy of the equipment, and the nature of the signs themselves. Too few divisions, and there will be insufficient information to distinguish signs; too many divisions and noise and variation in the data will cause crossover between adjacent columns in the histogram, resulting in erroneous classification.