So far, tests have been limited to investigating how adept GRASP is at recognising signs from the same signer. How well does it cope with signs it has seen from signers it has not seen?
Obviously, it might be useful to allow such learning if the effects on accuracy were not dramatic, since it would greatly reduce learning time. If the examples given to the learning algorithm were typical, this might allow GRASP to be trained on a general user base and then customised to suit each individual. This would reduce training time and thus allow GRASP to perform better ``out of the box''.
It could also result in better performance on more than one individual by providing more instances. As we saw in the previous section, the error rate is closely related to the the number of samples for each sign. Thus by using signs from other people, it might be possible to get extra samples ``for free'', thus reducing the error rate. For example, if we had 4 people with 20 samples each, this would result in 80 samples per sign.
On the other hand, there are a number of phenomena that could work against such a system. With 80 samples per sign, it is likely that noise can influence the data more significantly. There are a number of solutions to this, such as using the more noise-resistant version of IBL1, IBL3. Furthermore, and more dangerously, there may be a problem with concept boundaries being different for different people. Thus adding extra samples doesn't help, it confuses the learning algorithm by ``corrupting'' the feature space with noise.
So a test was performed. GRASP was trained on four of the data sets and then tested on the fifth. This was repeated on each of the four datasets. The results are shown in table 6.3. The name corresponds to the name of the person whose samples were used as the test dataset.
Table 6.3: Results of learning algorithms with inter-signer learning.
As we can see, the results are not so good. On average GRASP would only get a sign in every ten correct, if it is used by people on whom it is not trained.
There are a number of possible reasons for this. To investigate further, a ``confusion list'' was generated. This is a list of what signs were incorrectly classified, and what they were incorrectly classified as. A perusal of the data set illustrated an interesting occurrence -- the errors were not always random, but frequently a consistent error was made -- i.e. a given sign was constantly misinterpreted as another.
This suggests that the concepts for different people are different and that the problem is not with the learning algorithms themselves, but with the fact that while a particular sign for one person may represent a particular collection of attribute values, it may not be the same set of values for another person. In fact a particular set of attribute values for one signer may map to a different sign for another signer. This is shown in figure 6.5. Furthermore, this effect is probably more dramatic in our case, since in our validation of attributes, we selected attributes that were optimal for intra-signer recognition and not inter-signer recognition. There may in fact be a set of attributes that work well for inter-signer learning that we have not come up with.
Figure 6.5: What can happen when GRASP is
trained on one set of users and then tested on another.
In the above diagram, we see that based on training examples from other signer we can develop concepts A and B. Note that this is all in attribute space -- and not in real-space, since if this happened with the concept boundary in real life, we would have difficulty adapting to unseen people's signing.
For a new signer, the concept boundaries in attribute space may not exactly match those that have been learnt, for two main reasons: aspects of that person's physical character that affect the way he/she signs.
Furthermore, ``inter-signer concept overlap'' can occur. In the diagram, there is a region where the concept learnt from the known signers overlaps with a different concept of the new signer -- in this case concept B of the new signer is confused with concept A of the known signers.
From the fact we saw above that the errors it makes are consistent, it appears that we are indeed subject to the effects of inter-signer concept overlap.
To further investigate the following test was undertaken: GRASP was trained on all but one of the people. The dataset from that person was halved, with one half used for training and the other half for testing.
The results, with the corresponding values from the investigation from the number of samples is shown in table 6.4.
Table 6.4: Results of learning algorithms with inter-signer learning,
with partial training from that person.
The columns of the table show the error that occurs when tested on half of that person's dataset. In the first and third column, the datasets from other people are used for training in addition to the other half of the dataset from that person. In the second and fourth, only the other half of the dataset is used, without using other people's training sets.
From the above, we can see that to a limited and not too significant degree, such inter-signer concept confusion is indeed occurrring, since the error rate is higher when including other people's signs than when only using that person's signs. But the effect on the error rate is not too bad, ranging between 0.3 and 5 per cent. Thus it is feasible that we might start off the system trained on other people, and then with use, GRASP will increase its accuracy. Another approach that might be worth investigating would be the use of some selection criteria for the instances kept -- such as keeping the twenty most recent versions of a sign; thus providing a balance between computation time required and accuracy.
What we really want, however, is for the new signer concepts to transform to the known signer concepts (as indicated by the arrows in figure 6.5, which show the direction we would like the transformation to move the new signers' concepts). Such a transformation of the attributes may exist and may work in making it easier to use the system for new signers. As mentioned above, the transformation would compensate for two factors: firstly, that each signer has different physical limitations such as reach, and secondly that each signer has an individual style.
Several attempts were made to address the problems. These included:
It is believed that the stylistic aspects are the dominant cause of error. Even so, through analysis of the signs it might be possible to find a transformation derived by observing a small sample of the user's signs and then developing a suitable transformation. This is similar to modern speech recognition techniques, where a new user is asked to pronounce a small but significant subset of the lexicon of the system before being used, on which is developed a mapping to the standard set of instances. Of course, a complicating issue is that English can be segmented into a known collection of sub-sounds, while there is still some discussion over the feasibility of such an approach in sign language recognition.
However, trying to find optimal transformations is a thesis in itself and is certainly non-trivial.
An alternative and perhaps more difficult approach would be to try to extract attributes that are optimally suited to inter-signer recognition. The attributes we have selected are effective for intra-signer recognition; it may be that for inter-signer recognition a different set of attributes are required altogether. Each new set of attributes proposed would have to be validated in a way similar to that of chapter 5, but using inter-signer validation rather than intra-signer validation.