next up previous contents
Next: Random Segmentation Up: Implemented segmenters Previous: K-Means   Contents

Expectation-Maximisation

Expectation-Maximisation can be thought of as the K-Means algorithm augmented with a means of evaluating the number of clusters. The approach begins by attempting to cluster the data into two clusters, and then increases to three. If, on some holdout set of unseen data (or alternatively, using cross-validation) it performs better than two clusters, then three clusters are used. This process is repeated for four clusters, and so on until an optimal number of clusters is found.

We do not really have an implementation of the E-M algorithm. Rather, we rely on the one supplied with Weka, with some minor modifications (namely, 3-fold cross-validation is used for assessing the number of clusters, rather than 10-fold cross-validation). It accepts the following settings:



Mohammed Waleed Kadous 2002-12-10