next up previous contents
Next: Pausch and Davidson's Up: 2.8.2 Device-based approaches to Previous: 2.8.2 Device-based approaches to

Glove-Talk and Glove-TalkII -- Fels and Hinton's contribution

Fels and Hinton's work [FH93] and Fels' work [Fel94] take a completely different tack to solving the problem of allowing non-vocal people to communicate, by creating a connection between the signer and a speech synthesiser through an ``adaptive interface'' -- an interface which changes over time to better match the behaviour of the user of the interface.

Fels employed a VPL DataGlove Mark II for this purpose, with a Polhemus tracker attached for position and orientation tracking. Glove-Talk was based on having a root word determined by handshape, with an ending depending on the direction of movement. Hand speed and displacement affected speed of speech and stresses respectively. For each of these, a separate neural network is employed, with an additional network (the ``strobe network'') used for sign separation. The ``strobe'' network was by far the most complex, since it used five pieces of information derived from 10 consecutive frames (, velocity and acceleration) to determine the output, and it was the hardest to train, since it must be told when an action is beginning or ending, which is not an easy thing to do.

The outputs of the neural network are then connected to a speech synthesiser to form words (a DECTalk module in this case).

However, the system is very intensive because of the large neural networks employed (the ``root handshape'' network alone, for example, has 16 input nodes, 80 hidden nodes and 66 output nodes), and required an SGI 4D/240S to process the data in real time.

In Glove-TalkII, this system was refined and made practical for general-purpose use. A gesture-to-formant model was employedgif. Instead of the five networks used above, three networks are used. One is the Vowel/Consonant decider, which chooses whether the current sound is a vowel or consonant, and the other two are the individual vowel selector and consonant selectors. In addition, a foot-pedal was used to control volume, and a keyboard had to be used for the ``stop'' sounds (like B, D, G, K, P), since it was found it was difficult to generate these by hand (since the motion would have been too fast). The system was taught to adapt dynamically to different users, based on their interpretation of an initial mapping. A CyberGlove was also used in place of the DataGlove, providing more information about hand movement (since the DataGlove lacks some of the sensors of the CyberGlove)gif.



next up previous contents
Next: Pausch and Davidson's Up: 2.8.2 Device-based approaches to Previous: 2.8.2 Device-based approaches to



waleed@cse.unsw.edu.au