Next: Expectation-Maximisation
Up: Implemented segmenters
Previous: Implemented segmenters
  Contents
K-Means is a simple algorithm, the pseudocode for which can be found
in Figure 4.11. However, initial settings for K-means
(e.g. initial cluster membership and number of clusters) are set using
problem-specific information. The K-Means algorithm implementation is
largely historical, and is of limited practical use in the current
version of TClass, its role being largely replaced by the E-M
algorithm.
In our implementation, K-Means accepts the following parameters:
- numClusters: The number of clusters to create. It also
accepts the value ``auto''. Since K-Means has no mechanism for
estimating the number of clusters, some alternative means must be
used. Using ``auto'' works out the number of clusters as the average
number of instantiated features of the type we are interested in per
training stream. For instance, if we look at Table
4.2, we see that there are a total of 12
instantiated features from 6 training streams, and hence it would
calculate that there are
clusters of data.
Default value is ``auto''.
- initialdist: Initial distribution of points (i.e.,
which clusters the points end up in) can be accomplished in two
ways: random, where each point is allocated to a random cluster, or
ordered. In the ordered method, just as we ``guessed'' at the number
of clusters using the average number of instances, so we can also
allocate cluster membership based on the order of events. For
instance, if the number of clusters is calculated automatically to
be two, and there are two events, then if over all of the data we
put the first instantiated feature detected into the first cluster,
and the second into the second cluster, then that would be a
reasonable initial distribution of instantiated features. Default
value is ordered.
- closeness: When computing the distance of instances for
the evaluation of membership measures, should the distance from the
centroid to the instantiated feature being considered be used, or
should the distance to the nearest point belonging to the cluster be
used? Default value is distance from the centroid.
Next: Expectation-Maximisation
Up: Implemented segmenters
Previous: Implemented segmenters
  Contents
Mohammed Waleed Kadous
2002-12-10