Another approach, this one from statistics, is the chi-square test. The chi-square test measures the difference between the expected values in each of the cells in the contingency table. By comparing this against a chi-square statistic, a measure of the probability that the distribution of instantiated features we see in the contingency table is random. The smaller that this probability is, the less likely it is that the distribution is due to chance. This probability is called the power of the test.
The
statistic can be computed as:
where
is the observed number of instantiated features
belonging to region
in class
; i.e.
and
is the number of instantiated features that we would expect
for
if the region was independent of the class. Hence
In statistics, there is not one chi-square distribution but one for
each degree of freedom. It can be shown that the degrees of
freedom
in this case are
. Once we have computed the
statistic for our contingency table, we can compute the
probability from the definition of the
distribution, the
probability that this particular contingency table was the result of
random chance. The smaller that this probability is, the more
confident we can be that the distribution we have before is likely to
be useful for discriminative purposes.
Unlike information gain, the
distribution does not suffer
from the same issues of bias to more regions, at least theoretically.
However, it is more difficult and time-consuming to calculate than the
information gain or gain ratio (since to compute the probability
mentioned above requires computing the integral of the probability
density function of the of
function).