Aim: |
To describe an algorithm whose input is a collection of instances and their correct classification and whose output is one or more sentences in some form of logic, describing each of the possible output classes in terms of the input attributes. |
Keywords: Aq, conjunctive expressions, covering algorithm, instances, propositional learning systems, classes in classification tasks |
Rather than searching for discriminant functions, symbolic learning systems find expressions equivalent to sentences in some form of logic. For example, we may distinguish objects according to two attributes: size and colour. We may say that an object belongs to class 3 if its colour is red and its size is very small to medium. Following the notation of Michalski (1983), the classes in Figure 1 may be written as:
Interestingly, one of the popular early machine learning algorithms, Aq (Michalski, 1973), had its origins in switching theory. One of the concerns of switching theory is to find ways of minimising logic circuits, that is, simplifying the truth table description of the function of a circuit to a simple expression in Boolean logic. Many of the algorithms in switching theory take tables like Figure 1 and search for the best way of covering all of the entries in the table.
Figure 1: Discrimination on attributes and values
Aq, uses a covering algorithm, to build its concept description:
cover := {}; repeat select one positive example, e; construct the set of all conjunctive expressions that cover e and no negative example in E-; choose the `best' expression, x, from this set; add x as a new disjunctof the concept; remove all positive examples covered by x until there are no positive examples left;The `best' expression is usually some compromise between the desire to cover as many positive examples as possible and the desire to have as compact and readable a representation as possible. In designing Aq, Michalski was particularly concerned with the expressiveness of the concept description language.
A drawback of the Aq learning algorithm is that, unlike ID3, it does not use statistical information, present in the training sample, to guide induction.
Summary: |
Aq iteratively constructs a logic-based description of concepts on the basis of instances described by features. |
It is instructive to compare the shapes that are produced by various learning systems when they partition the universe. Figure 1 demonstrates one weakness of decision tree and other symbolic classification. Since they approximate partitions with rectangles (if the universe is 2-dimensional) there is an inherent inaccuracy when dealing with domains with continuous attributes. Function approximation methods and IBL may be able to attain higher accuracy, but at the expense of transparency of the resulting theory. It is more difficult to make general comments about genetic algorithms since the encoding method will affect both accuracy and readability.
As we have seen, useful insights into induction can be gained by visualising it as searching for a cover of objects in the universe. Unfortunately, there are limits to this geometric interpretation of learning. If we wish to learn concepts that describe complex objects and relationships between the objects, it becomes very difficult to visualise the universe. For this reason, it is often useful to rely on reasoning about the concept description language.
As we saw, the cover in Figure 1 can be expressed as clauses in propositional logic. We can establish a correspondence between sentences in the concept description language (the hypothesis language) and a diagrammatic representation of the concept. More importantly, we can create a correspondence between generalisation and specialisation operations on the sets of objects and generalisation and specialisation operations on the sentences of the language.
Figure 2: Generalisation as set covering
For example, Figure 2 shows two sets, labelled class 1 and class 2. It is clear that class 1 is a generalisation of class 2 since it includes a larger number of objects in the universe. We also call class 2 a specialisation of class 1. By convention, we say the description of class 1 is a generalisation of the description of class 2. Thus,
CRICOS Provider Code No. 00098G