Given this notation, we can define the following information measures, for each of the cell, class and regions :
The information gain is the difference between the information stored in the cells and the information about class, that is to say:
This is the heuristic that was originally used by Quinlan in ID3
[Qui86]. However, it has one significant drawback: it does
not take into account the number of regions. If information gain were
to be used ``raw'' without regard to the number of regions, then this
would lead to a bias to having a huge number of regions. Imagine a
region for each point in the space, such that each region only has one
instantiated feature and therefore one class. By substituting in the
above formula, we get that such a selection of regions has an
information gain of 1, which is the most possible. But it is of no use
to us. Hence, in C4.5, Quinlan introduces the gain ratio. The gain
ratio compensates for the number of attributes by normalising by the
information encoded in the split itself. It can be shown that using
the above formula the gain ratio is
.
In these experiments with metafeatures, therefore, we used the gain ratio as one of our disparity measures. The higher the gain ratio, the more likely the subdivision into regions is useful for classification.