Algorithmic learners usually get better with practice, that is, the more examples you give them, the better they get at discriminating between classes. Notice the ``usually'' -- there are several phenomenon that can begin to occur if you give too much (or more particularly, too much noisy) data. One of the most common is over-fitting of the data. If too many samples are given, learners will develop very constrained concepts -- too constrained -- so that anything that does not match the concept fairly closely is not accepted as a member of that concept.
Various approaches exist for handling the problem that is adapted for each of the algorithms, for example with symbolic decision tree learners, the tree is ``pruned'' of nodes in the tree which do not perform well or do not generalise. In instance based learning, there are also ways of culling the instances, such as IBL3.
Still, it is logical that if we have more samples of a sign that the error rate will fall, because the more samples that any learning algorithm has, the better the concept description that it is able to develop.
To test the impact that changing the number of training instances has
on the error rate, we tested each of the three large data sets with an
increasing number of training instances for each sign -- from 2
instances for each sign up to 14 with an increment of 2. The training
examples used were selected at random. The remainder of the data-set
was used for testing
.
The results are shown in figure 6.1. As can be seen, there is a decrease in the error rate as the number of samples per sign increases. This effect, as we would expect, tapers off, since the first few signs learnt help a great deal, but subsequent signs only help a little more.
Figure 6.1: The effect that the number of samples has on
the error rate.
A logarithmic relationship was hypothesised. So the graph was re-plotted with a logarithmic axis (figure 6.2).
Figure 6.2: The effect of the number of samples on the
error rate, this time with a logarithmic x-axis.
From the appearance of the above, there seems to be strong support for the hypothesis that a logarithmic relationship exists between the number of samples per sign and the error rate.
Also note that C4.5 does not appear to learn significantly faster than IBL. In figure 6.2, C4.5 does not appear to have a steeper gradient than IBL or vice versa. It indicates the number of samples per sign does not alter the relationship between the error rates of C4.5 and IBL even over a long period. It might have been, for example, that C4.5 would have been a better choice in the long term, because of the (negatively) steeper gradient. This hypothesis, however, is not supported by the data.