We use a simple example from Banerji (1980) to the use of background knowledge. There is a language for describing instances of a concept and another for describing concepts. Suppose we wish to represent the binary number, 10, by a left-recursive binary tree of digits `0' and `1':

[head: [head: 1; tail: nil]; tail: 0]`head' and `tail' are the names of attributes. Their values follow the colon. The concepts of binary digit and binary number are defined as:

Thus, an object belongs to a particular class or concept if it satisfies the logical expression in the body of the description. Predicates in the expression may test the membership of an object in a previously learned concept.

Banerji always emphasised the importance of a description language that could `grow'. That is, its descriptive power should increase as new concepts are learned. This can clearly be seen in the example above. Having learned to describe binary digits, the concept of digit becomes available for use in the description of more complex concepts such as binary number.

Extensibility is a natural and easily implemented feature of horn-clause logic. In addition, a description in horn-clause logic is a logic program and can be executed. For example, to recognise an object, a horn clause can be interpreted in a forward chaining manner. Suppose we have a set of clauses:

(3) (4)and an instance:

(5)Clause (3) recognises the first two terms in expression (5) reducing it to

Clause (4) reduces this to C2. That is, clauses (3) and (4) recognise expression (5) as the description of an instance of concept C2.

When clauses are executed in a backward chaining manner, they can either verify that the input object belongs to a concept or produce instances of concepts. In other words, we attempt to prove an assertion is true with respect to a background theory. Resolution (Robinson, 1965) provides an efficient means of deriving a solution to a problem, giving a set of axioms which define the task environment. The algorithm takes two terms and resolves them into a most general unifier, as illustrated in Figure 10 by the execution of a simple Prolog program.

Figure 10: A resolution proof tree from Muggleton and Feng (1990).

The box in the figure contains clauses that make up the theory, or knowledge base, and the question to be answered, namely, "is it true that a hammer is heavier than a feather"? A resolution proof is a proof by refutation. That is, answer the question, we assume that it is false and then see if the addition, to the theory, of this negative statement results in a contradiction.

The literals on the left hand side of a Prolog clause are positive. Those on
the left hand side are negative. The proof procedure looks for
*complimentary* literals in two clauses, i.e. literals of opposite sign
that unify. In the example in Figure 10,

and

unify to create the first resolvent,

.A side effect of unification is to create variable substitutions . By continued application of resolution, we can eventually derive the empty clause, which indicates a contradiction.

Plotkin's (1970) work "originated with a suggestion of R.J. Popplestone that since unification is useful in automatic deduction by the resolution method, its dual might prove helpful for induction. The dual of the most general unifier of two literals is called the least general generalisation". At about the same time that Plotkin took up this idea, J.C. Reynolds was also developing the use of least general generalisations. Reynolds (1970) also recognised the connection between deductive theorem proving and inductive learning:

Robinson's Unification Algorithm allows the computation of the greatest common instance of any finite set of unifiable atomic formulas. This suggests the existence of a dual operation of `least common generalization'. It turns out that such an operation exists and can be computed by a simple algorithm.The method of least general generalisations is based on

The least general generalisation of

(6) p(g(a), a) (7) and p(g(b), b) (8) is p(g(X), X).Under the substitution {a/X}, (8) is equivalent to (6). and under the substitution {b/X}, (8) is equivalent to (7). Therefore, the least general generalisation of

Buntine (1988) pointed out that simple subsumption is unable to take advantage of background information which may assist generalisation. Suppose we are given two instances of a concept cuddly_pet,

(9) (10)Suppose we also know the following:

(11) (12)According to subsumption, the least general generalisation of (4) and (5) is:

(13)since unmatched literals are dropped from the clause. However, given the background knowledge, we can see that this is an over-generalisation. A better one is:

(14)The moral being that a generalisation should only be done when the relevant background knowledge suggests it. So, observing (9), use clause (11) as a rewrite rule to produce a generalisation which is clause (14). which also subsumes clause (10).

Buntine drew on earlier work by Sammut (Sammut and Banerji, 1986) in constructing his generalised subsumption. Muggleton and Buntine (1998) took this approach a step further and realised that through the application of a few simple rules, they could invert resolution as Plotkin and Reynolds had wished. Here are two of the rewrite rules in propositional form:

Given a set of clauses, the body of one of which is completely contained in the bodies of the others, such as:

the

and replaces them with:

These two operations can be interpreted in terms of the proof tree shown in Figure 7. Resolution accepts two clauses and applies unification to find the maximal common unifier. In the diagram, two clauses at the top of a "V" are resolved to produce the resolvent at the apex of the "V". Absorption accepts the resolvent and one of the other two clauses to produce the third. Thus, it inverts the resolution step.

Intra-construction automatically creates a new term in its attempt to simplify descriptions. This is an essential feature of inverse resolution since there may be terms in a theory that are not explicitly shown in an example and may have to be invented by the learning program.

For example, the realities of drug design require descriptive powers that
encompass stereo-spatial and other long-range relations between different parts
of a molecule, and can generate, in effect, new theories. The pharmaceutical
industry spends over $250 million for each new drug released onto the market.
The greater part of this expenditure reflects today's unavoidably "scatter-gun"
synthesis of compounds which *might*} possess biological activity. Even a
limited capability to construct predictive theories from data promises high
returns.

The relational program Golem was applied to the drug design problem of
modelling structure-activity relations (King *et al*, 1992). Training data
for the program was 44 trimethoprim analogues and their observed inhibition of
E. coli dihydrofolate reductase. A further 11 compounds were used as unseen
test data. Golem obtained rules that were statistically more accurate on the
training data and also better on the test data than a Hansch linear regression
model. Importantly, relational learning yields understandable rules that
characterise the stereochemistry of the interaction of trimethoprim with
dihydrofolate reductase observed crystallographically. In this domain,
relational learning thus offers a new approach which complements other methods,
directing the time-consuming process of the design of potent pharmacological
agents from a lead compound, variants of which need to be characterised for
likely biological activity before committing resources to their synthesis.

CRICOS Provider Code No. 00098G