A k-means clustering algorithm was used initially used over the whole domain; one clusterer per channel. The k-means algorithm is shown in figure 6.1.
However, there are problems with such a technique: k-means requires the number of clusters to be specified beforehand. Determining the number of clusters is not easy. In addition, a quick check of the parameter space indicated that it appeared that clustering over all data would not be productive, since there did not appear to be any obvious clustering of the space. For example, figure 6.2 shows two dimensions (of four) of the parameter space of the events on the y channel, with the points coming from all ten classes. It is also difficult to determine the initial choice of centroids C.
Figure 6.2: Parameter space with all classes
Figure 6.3: Parameter space with come class only
So, as an alternative, we used per-class k-means clustering. For each class and each channel we clustered the events that came from that class only. Firstly, determining the number of clusters can be done by taking the average number of events over the set of training instances belong to the class; secondly, because things belonging to the same class are likely to have similar events, clustering is more likely to succeed. For example, one of the signs considered for learning was the sign come. Looking at the same two dimensions as we did for figure 6.2, but only plotting events belonging to five examples from the class come produces 6.3. The five examples had 5, 3, 3, 5 and 4 events on the y channel, with an average of four events. Thus the algorithm was told to make four clusters. Finally, C can be guessed by doing an initial clustering based on the order of the events from each instance (so all the ``first'' events go in one cluster, all the ``second'' events go in one cluster) normalised by the number of events, and then computing the centroid.