next up previous contents
Next: Feature selection and learning Up: Example application Previous: Clustering algorithm

Event attribution

The output of the k-means algorithm is a set of clusters, each cluster described in terms of its centroid (described as a point in the parameter space) and the points that belong to cluster. The standard deviation for each dimension of each cluster is also computed.

It is possible to make a rough estimate of the confidence of membership of a particular event to a given cluster. This estimate is based on the following assumptions:

Our estimate of confidence is based on calculating the probability that an element of this cluster could be as far or further out from the centroid of this cluster as the given event.

   figure750
Figure 6.4: An example illustrating how the confidence estimate can be made

In figure 6.4gif let x1 and x2 be two dimensions of the parameter space. The probability we are trying to estimate is the probability that a point in the shaded area belongs to the cluster. Let tex2html_wrap_inline2039 be the standard deviation of x1 and tex2html_wrap_inline2043 be the standard deviation of x2. We can express the difference along the x1 axis between p and c as a multiple of the standard deviation, so that tex2html_wrap_inline2053 . Similarly, we can express the distance along the x2 axis as tex2html_wrap_inline2057 . If x1 and x2 are independent and the cluster has a Gaussian distribution, then we can estimate the probability that an element belonging to the cluster is in the grey area as:

displaymath2027

where Z is the standard normal distribution (i.e. a normal distribution with a mean of zero and a standard deviation of 1).

In other words, it's the probability that the point of consideration is more than a standard deviations away from the centroid in the x1 dimension and more than b standard deviations away from the centroid in the x2 dimension. The values of tex2html_wrap_inline2071 can be found easily using table lookup. This method can be generalised to higher dimensions.

To understand how this works, consider some examples: imagine we have an event that matches the centroid of the cluster exactly. Then a = b = 0. But tex2html_wrap_inline2075 will occur with a probability 1. Thus our confidence that an event that matches the centroid of the cluster belongs to the cluster is 1. This is intuitive, since the centroid can be thought of as the most obvious example of an element belonging to the cluster.

Now consider what happens when p is tex2html_wrap_inline2039 away from c in the x1 dimension and tex2html_wrap_inline2043 in the x2 dimension. From a statistics text it can be determined that tex2html_wrap_inline2089 . Thus the probability that a point lies tex2html_wrap_inline2091 or further away from the centroid, but still belongs to this cluster is tex2html_wrap_inline2093 . Alternatively, a point which is tex2html_wrap_inline2095 away in the x1 dimension, but has the same x2 value as the centroid will have a confidence of tex2html_wrap_inline2101 .

For convenience, we take the logarithm of the confidence. This simplifies the mathematics (since multiplication becomes addition) and also avoids underflow problems (with very small probabilities) when these techniques are implemented using floating point operations. Thus, since the range of confidence is [0..1], the range of the logarithm of the confidence is tex2html_wrap_inline2105 .

Again, we use a per-class technique. Thus we only check for the presence or absence of clusters coming from the appropriate class. For each training instance and for each cluster, we find the event (if there is one) in that training instance which has the highest probability of belonging to that cluster. This becomes our estimate of the confidence that the training instance has an event belonging to that cluster.


next up previous contents
Next: Feature selection and learning Up: Example application Previous: Clustering algorithm

Mohammed Waleed Kadous
Tue Oct 6 13:04:40 EST 1998