A second possible refinement is to create bounds on the variation of parameters when converting rules back to human-readable form.
In the human-readable definition generated in Figure 4.18, there is no guide given to the typical variation in attribute values. All that can be said is that it is at ``approximately'' time 10 and for a duration of ``approximately'' 4 timesteps. In real-world datasets, this may not be good enough a characterisation; in some cases ``approximately'' 10 means ``between 9 and 11'', and in other cases, it means ``between 5 and 15''. How can we give the user some idea of the extent of the variation?
One approach is to give the user the region boundaries as output. After all, this would provide exactly what the classifier uses. However, a moment's thought will show that this may be a less than ideal solution. Firstly, most people do not have a sufficiently good grasp of mathematics to be able to deal with the region boundary expressed in this form. Secondly, this may work for a two-dimensional metafeature like LoudRun or LocalMax, but many metafeatures are four or five-dimensional. In a five-dimensional space, the region boundaries would be expressed as convex hyper-polyhedra. This representation is not likely to be of much use in practice.
However, there is no reason why the explanation of the concept learnt by our system needs to be expressed so that it exactly matches the classifier produced by our system. We can give the user a simplified explanation of what our system has learnt, but when it comes to actual classification, we need not use the simplified model at all.
This is a subtle idea
. For example, a traditional
learner, like C4.5, outputs the scheme that it actually uses to
classify - a decision tree. Imagine if, for its own internal
calculations, C4.5 retained the original tree, but applied various
heuristics to reduce the size of the tree so that a human reader would
get an approximation of what C4.5 had learnt, with enough of a
characterisation that it's still useful.
This suggests an approach: assuming that the learner uses binary membership, then one can find the bounding boxes on the instantiated features within each region. These boxes could then be included in the definition. Figure 4.22 shows the bounding boxes for each of the different centroids.
It is an approximation of the actual concept used for classification, but it does give the user some intuition about what the system was doing. Using this, the original definition would take on the form shown in Figure 4.23.
There is a complication if we want to use relative membership and variation bounds at the same time: the binary membership means we can create the ``bounding box'' for all the instances in a region. However, if relative memberships are used for attribute values, our propositional rule learner may produce a rule of the form rgn2 >= 0.9 for Random Search Trial 2. This means that we can no longer use the bounding box for all of the instances in region 2 shown in Figure 4.22, as not all of the points in region fulfil the requirement.
How do we solve the problem? We can still calculate the relative membership for each point in the region, and we only draw the bounding box around those points fulfilling the requirement. The relative membership for each point in region 2 is shown in Table 4.12.
|
Hence we can see that only the points
lie in the
region defined by the constraint rgn2 >= 0.9. Hence the
bounding box would be
. The human-readable version
of the rule rgn2 >= 0.9 is shown in Figure
4.24.
So, rather than being a problem, the use of relative region membership allows us to build more exact definitions, especially insofar as the bounds, while also giving the learner more flexibility.