Fels and Hinton's work [FH93] and Fels' work [Fel94] take a completely different tack to solving the problem of allowing non-vocal people to communicate, by creating a connection between the signer and a speech synthesiser through an ``adaptive interface'' -- an interface which changes over time to better match the behaviour of the user of the interface.
Fels employed a VPL DataGlove Mark II for this purpose, with a
Polhemus tracker attached for position and orientation tracking.
Glove-Talk was based on having a root word determined by handshape,
with an ending depending on the direction of movement. Hand speed and
displacement affected speed of speech and stresses respectively. For
each of these, a separate neural network is employed, with an
additional network (the ``strobe network'') used for sign separation.
The ``strobe'' network was by far the most complex, since it used five
pieces of information derived from 10 consecutive frames (
, velocity and acceleration) to determine the
output, and it was the hardest to train, since it must be told when an
action is beginning or ending, which is not an easy thing to do.
The outputs of the neural network are then connected to a speech synthesiser to form words (a DECTalk module in this case).
However, the system is very intensive because of the large neural networks employed (the ``root handshape'' network alone, for example, has 16 input nodes, 80 hidden nodes and 66 output nodes), and required an SGI 4D/240S to process the data in real time.
In Glove-TalkII, this system was refined and made practical for
general-purpose use. A gesture-to-formant model was
employed
. Instead of the five networks
used above, three networks are used. One is the Vowel/Consonant
decider, which chooses whether the current sound is a vowel or
consonant, and the other two are the individual vowel selector and
consonant selectors. In addition, a foot-pedal was used to control
volume, and a keyboard had to be used for the ``stop'' sounds (like
B, D, G, K, P), since it was found it was difficult to
generate these by hand (since the motion would have been too fast).
The system was taught to adapt dynamically to different users, based
on their interpretation of an initial mapping. A CyberGlove was also
used in place of the DataGlove, providing more information about hand
movement (since the DataGlove lacks some of the sensors of the
CyberGlove)
.