Propositional Learning Systems

Reference: Bratko section 18.5

Aim:
To describe an algorithm whose input is a collection of instances and their correct classification and whose output is one or more sentences in some form of logic, describing each of the possible output classes in terms of the input attributes.
Keywords: Aq, conjunctive expressions, covering algorithm, instances, propositional learning systems, classes in classification tasks

Rather than searching for discriminant functions, symbolic learning systems find expressions equivalent to sentences in some form of logic. For example, we may distinguish objects according to two attributes: size and colour. We may say that an object belongs to class 3 if its colour is red and its size is very small to medium. Following the notation of Michalski (1983), the classes in Figure 1 may be written as:

Note that this kind of description partitions the universe into blocks, unlike the function approximation methods that find smooth surfaces to discriminate classes.

Interestingly, one of the popular early machine learning algorithms, Aq (Michalski, 1973), had its origins in switching theory. One of the concerns of switching theory is to find ways of minimising logic circuits, that is, simplifying the truth table description of the function of a circuit to a simple expression in Boolean logic. Many of the algorithms in switching theory take tables like Figure 1 and search for the best way of covering all of the entries in the table.

Figure 1: Discrimination on attributes and values

Aq, uses a covering algorithm, to build its concept description:

          cover := {};
          repeat
                  select one positive example, e;
                  construct the set of all conjunctive expressions
                         that cover e and no negative example in E-;
                  choose the `best' expression, x, from this set;
                  add x as a new disjunctof the concept;
                  remove all positive examples covered by x
          until there are no positive examples left;


The `best' expression is usually some compromise between the desire to cover as many positive examples as possible and the desire to have as compact and readable a representation as possible. In designing Aq, Michalski was particularly concerned with the expressiveness of the concept description language.

A drawback of the Aq learning algorithm is that, unlike ID3, it does not use statistical information, present in the training sample, to guide induction.

Summary:
Aq iteratively constructs a logic-based description of concepts on the basis of instances described by features.

APPENDIX

Michalski has always argued in favour of rule-based representations over tree structured representations, on the grounds of readability. When the domain is complex, decision trees can become very `bushy' and difficult to understand, whereas rules tend to be modular and can be read in isolation from the rest of the knowledge-base constructed by induction. On the other hand, decision trees induction programs are usually very fast. A compromise is to use decision tree induction to build an initial tree and then derive rules from the tree thus transforming an efficient but opaque representation into a transparent one (Quinlan, 1987).

It is instructive to compare the shapes that are produced by various learning systems when they partition the universe. Figure 1 demonstrates one weakness of decision tree and other symbolic classification. Since they approximate partitions with rectangles (if the universe is 2-dimensional) there is an inherent inaccuracy when dealing with domains with continuous attributes. Function approximation methods and IBL may be able to attain higher accuracy, but at the expense of transparency of the resulting theory. It is more difficult to make general comments about genetic algorithms since the encoding method will affect both accuracy and readability.

As we have seen, useful insights into induction can be gained by visualising it as searching for a cover of objects in the universe. Unfortunately, there are limits to this geometric interpretation of learning. If we wish to learn concepts that describe complex objects and relationships between the objects, it becomes very difficult to visualise the universe. For this reason, it is often useful to rely on reasoning about the concept description language.

As we saw, the cover in Figure 1 can be expressed as clauses in propositional logic. We can establish a correspondence between sentences in the concept description language (the hypothesis language) and a diagrammatic representation of the concept. More importantly, we can create a correspondence between generalisation and specialisation operations on the sets of objects and generalisation and specialisation operations on the sentences of the language.

Figure 2: Generalisation as set covering

For example, Figure 2 shows two sets, labelled class 1 and class 2. It is clear that class 1 is a generalisation of class 2 since it includes a larger number of objects in the universe. We also call class 2 a specialisation of class 1. By convention, we say the description of class 1 is a generalisation of the description of class 2. Thus,

(1)
is a generalisation of
(2)
Once we have established the correspondence between sets of objects and their descriptions, it is often convenient to forget about the objects and only consider that we are working with expressions in a language. The reason is simple. Beyond a certain point of complexity, it is not possible to visualise sets, but it is relatively easy to apply simple transformations on sentences in a formal language. For example, clause (2) can be generalised very easily to clause (1) by dropping one of the conditions.

CRICOS Provider Code No. 00098G