Now, we do the same thing, using a directed segmentation approach. We
will set
,
and
. We will
look at several disparity measures.
It should be noted that one of the advantages of directed over undirected segmentation is that it chooses an instance which is an observed instantiated feature. Quinlan [Qui93] notes that experts who look at the results of learning with continuous attributes find it easier to understand cutpoints that can be found in the data, rather than averages, since it may be that certain averages are physically impossible (e.g. in the case of the previous result, a LoudRun that has a duration of 3.33 timesteps). Although his results were with cutpoints, and ours are with regions, the result still holds. We will use the random search algorithm proposed above, but run three iterations in parallel for ease of comparison.
The first stage of the algorithm is to generate 3 random subsets with between 2 and 3 elements. The results of this step are shown in Table 4.5.
For each of these, it is possible to generate a region boundary diagram. Figure 4.14, 4.15 and 4.16 show the parameter space using trials 1,2 and 3 respectively.
Now we can calculate the contingency tables for each trial. We do this
by counting the number of instances of each class in each region. For
example, in Table 4.6, there is 1 instantiated
feature in the region around centroid 1 (
) which came from a stream
whose class was Angry, and 3 whose original class was
Happy
. These go into the first column.
Similarly, in region 2 (
), the region surrounding
, there
are 7 ``Angry'' instantiated features and no ``Happy'' instantiated
features (counting the border case as belonging to this region). This
goes into the second column. The final column is just the row totals
(i.e.
and
) and the final row is just the column totals
(i.e.
and
). The value in the bottom right hand corner
is the sum of either the row totals or the column totals. These should
of course be the same value and it should be equal to the total number
of instantiated features.
Finally, we can calculate a number of disparity measures from the contingency tables, based on the methods discussed in Section 4.7. The results are shown in Table 4.9.
For example, to work out the information gain in the first trial, this can be computed using:
The gain ratio can be computed by dividing the information gain by
, which in this case is:
. Hence the gain ratio is
.
Computing the
heuristic is more involved. For first cell in
trial 1,
. We can then compute
the
component for that particular cell using
as
. If we repeat
this total for the remaining three cells in trial 1, and sum them, we
get a total of 9.79. This number is known as the
statistic.
Note also that there are two classes and two regions, hence the degree
of freedom of the statistic is
.
Usually this statistic is compared against a value in a
critical-values table in a significance test. For example, it
is used to decide whether there is a less than 5 per cent chance that
such a distribution is random. However, we want to know for a
particular
statistic and the degrees of freedom probability
that this process was the result of random chance. This can be
computed easily using the cumulative distribution function of the
distribution. The result is shown in the fourth column of
Table 4.9. Finally, for comparison, we take the
negative log of the probability. This is mainly for to match the
direction of the other heuristics (bigger is better) and because it
just as easy to compute without suffering floating-point difficulties
in the implementation.
Looking at Table 4.9, it seems that the results for Trial 3 are inferior to the other two. However, Trials 1 and 2 are quite close; information gain puts Trial 2 ahead of Trial 1, unlike the other two measures. A tradeoff has occurred: even though Trial 2 produced a ``purer'' result (in that there is only one class in each region), it did so using 3 centroids, Trial 1 accomplishes an ``almost as pure'' result using only 2 centroids.
These regions, much as for the previous case with K-means, can now be used to create synthetic events. If we use Trial 2, for example, we get the features shown in Table 4.10.
| ||||||||||||||||||||||||||||||||||||||||
Feeding it to C4.5 produces Figure 4.17. Note that we can easily post-process the result to produce something that is far more readable by using the LoudRun metafeature, which is shown in Figure 4.18.
![]() |
Note that if we did the same for the K-means value, we would get the slightly nonsensical definition shown in Figure 4.19.