### Solutions to Revision Exercises - Prolog, Knowledge Representation, Natural Language Processing, Machine Learning

1. (15 marks) Prolog Programming

1. Given the Prolog database and rules shown below, what is the output of the Prolog query that follows?

```   child_of(mary, steve).
child_of(mary, anne).
child_of(alice, anne).
child_of(alice, steve).
child_of(jane, steve).
child_of(leslie, steve).
female(mary).
female(alice).
female(jane).

sisters(Person1, Person2) :-
child_of(Person1, Parent1),
child_of(Person1, Parent2),
child_of(Person2, Parent1),
child_of(Person2, Parent2),
not(Parent1 = Parent2),
female(Person1),
female(Person2),
not(Person1 = Person2).
```

Query:

`?- sisters(Sis1, Sis2).`

```Sis1 = mary,
Sis2 = alice;

Sis1 = mary,
Sis2 = alice;

Sis1 = alice,
Sis2 = mary;

Sis1 = alice,
Sis2 = mary;

false.```

Comment:Essentially there are two persons here who are sisters, namely `mary` and `alice`, so we intuitively expect a single answer, effectively verifying that `sisters(mary, alice).` is true. However, Prolog also proves `sisters(alice, mary).` which doubles the number of solutions found. It also proves each of these two ways, binding `Parent1` to `steve` and `Parent2` to `anne` in one case, and vice-versa in the other case, again doubling the number of solutions (to give 4 altogether).

2. Does the following Prolog query succeed? If not, explain why not. If so, what bindings will be reported?

``` ?- believes(X, likes(mary, pizza)) = believes(frank, likes(Y, pizza)). ```

```X = frank,
Y = mary
```

Comment: Because X and Y start with capital letters, they are variables, so they match the "atoms" `frank` and `mary` on the opposite side of the equals sign.

Integers and floating point numbers are built into most programming languages, including Prolog. However, suppose that numbers and arithmetic operations were not available. It is still possible to define numbers in Prolog and to write programs to implement simple arithmetic operations. In the following question, you MUST NOT use any of Prologs built-in arithmetic predicates such as is, <, +, *, etc. (except in part (f)). Also, do not use the cut operation (!) in your program.

A positive integer can be defined as zero or the successor of another number. Thus, zero can be represented by 0 and a number, such as two, can be represented by s(s(0)), where two is the successor of the successor of 0. Thus s(X) can be thought of as X+1. We call numbers written like this s-expressions.

In parts (c) to (f), don't worry about what happens if X (and Y and Z) are not s-expressions. You don't have to cope with this, and in particular, you don't have to check for this.

See also the Wikipedia entry on the Peano axioms. This link is just for interest - the material at the link is not examinable.

3. Write a Prolog predicate, s_number(X), which is true when X is a valid number. For example:
```?- s_number(s(s(0)).
true.
```

```s_number(0).
s_number(s(X)) :-
s_number(X).
```

4. Write a Prolog predicate, minus1(X,Y), which is true if X is the successor of Y, i.e. if X = Y+1, or Y=X-1. Naturally, this predicate will fail if X is 0.
```?- minus1(s(s(0)), Y).
Y = s(0)
?- minus1(0, Y).
false.
```

```minus1(s(X), X).
```

5. Write a Prolog predicate, subtract(X, Y, Z), which is true if Z represents the difference between X and Y, assuming X is greater than or equal to Y. If X is less than Y, then this predicate is undefined.
```?- subtract(s(s(s(s(0)))), s(s(0)), Z).
Z = s(s(0))
```

```subtract(X, 0, X).
subtract(s(X), s(Y), Z) :-
subtract(X, Y, Z).
```

6. Write a Prolog predicate value_of(X, Y), which is true if Y is the natural number, in normal notation, corresponding to the s(s(...)) expression X.
```?- value_of(s(s(s(s(0)))), Y).
Y = 4
```

```value_of(0, 0).
value_of(s(X), Y) :-
value_of(X, Z),
Y is Z+1.
```

2. Knowledge Representation (5 marks)

1. Draw a semantic network to represent the sentence "Fred finds a furry purple cube."

NB: This question is about semantic nets, not NLP.

2. Briefly describe the function of each part of the match-resolve-act cycle in a rule-based system

loop

• match conditions of rules with contents of working memory
• if no rule matches then stop
• resolve conflicts - i.e. if more than one rule is matched, and so apparently eligible to fire, then choose one of them
• act - i.e. perform conclusion part of rule
end loop

3. Briefly describe two types of demon used in the frame system described in lectures.

demons are triggered when a new value is put into a slot.
if_removed
demons are triggered when a value is removed from a slot.
if_replaced
is triggered when a slot value is replaced.
if_needed
demons are triggered when there is no value present in an instance frame and a value must be computed from a generic frame.
if_new
is triggered when a new frame is created.
range
is triggered when a new value is added. The value must satisfy the range constraint specified for the slot.
help
is triggered when the range demon is triggered and returns false.

3. Natural Language Processing - Short Answers (13 marks) Answer each of the following in no more than a few words each:

1. What is the significance of a feature like VAR being a head feature for categories like S, VP, NP?

Answer: Head feature values are inherited from the head subsconstituent of the object: thus an S inherits its VAR feature from its VP, a VP from its V, and an NP from its head N, NAME, or PRO.

2. What form can grammar rules take in a regular grammar?

Answer: At or AtN, where A and N are non-terminals, and t is a terminal symbol.

3. Give an example of a predicate operator in the logical form language.

Answer: plur transforms a predicate like dog1, which is true of a single dog, into a predicate plur(dog1) which is true of a set of more than one dog.

4. Name three more components of the logical form language described in lectures, and give an example of them.

Answer: As well as predicate operators, there are the following components to the logical form language:
 terms like fido1 and name(j1, 'John') predicates like bites1 logical operators like not, & propositions like bites1(fido1, postman1) quantifiers like every, most, the modal operators like believe, pres, assert term constructors like pro(—, —) and name(—, —)

5. What is the result of applying the λ-expression λ x likes(x, pizza) to mary, after λ-reduction?

6. What is meant by the term subcategorization?

Answer: Subcategorization refers to the property of verbs (and adjectives) of allowing only certain patterns of complement structure to follow them. For example, laugh cannot be followed by an NP in English, (so it subcategorizes `none`) and tell can be followed by a NP and then a VP in the infinitive form, such as tell Peter to buy petrol (so it subcategorizes `np_vp:inf`).

7. Give an example of a sentence that exhibits structural ambiguity, and draw two trees showing the ambiguity.

8. What general form can the rules of a context-sensitive grammar take?

Answer: Two possible answers: here is one - rules must be of the form λAρ → λσρ where λ and ρ are (possibly empty) strings of alphabet symbols, A is a non-terminal, and σ is a string of alphabet symbols.

9. Write the down the condition under which two feature values agree, and apply this to the two feature values {3s,3p} and {1s,2s,1p,2p,3p}.

Answer: Two feature values unify by intersection, and they agree if they have non-empty intersection.
{3s,3p} ∩ {1s,2s,1p,2p,3p} = {3p}, which is non-empty, so these two features agree.

4. Given the lexicon and grammar information below, trace the steps in chart-parsing the two sentences Time flies. It flies like an arrow.

Lexicon must cover any words to be processed.
 time NOUN, VERB flies NOUN, VERB it PRONOUN like PREP, ADJ, VERB, NOUN, ADV, CONJ an DET arrow NOUN

Grammar
 Rule Name S → NP VP G-S1 S → VERB NP G-S2 NP → NOUN G-NP1 NP → PRONOUN G-NP2 NP → DET NOUN G-NP3 PP → PREP NP G-PP1 VP → VERB G-VP1 VP → VERB PP G-VP2

Note: The numbers that are part of active arc names are not especially significant. All that matters is that different arcs have different names.

*sentence*: time *current-position*: 1
*constituents*:
NOUN1: NOUN "time" from 0 to 1
VERB1: VERB "time" from 0 to 1
NP1: NP NOUN1 by rule G-NP1 from 0 to 1
VP1: VP VERB1 by rule G-VP1 from 0 to 1
*active-arcs*:
ARC1: S NP1 • VP by rule G-S1 from 0 to 1
ARC2: S VERB1 • NP by rule G-S2 from 0 to 1
ARC3: VP VERB1 • PP by rule G-VP2 from 0 to 1

*sentence*: time flies *current-position*: 2
NOUN2: NOUN "flies" from 1 to 2
VERB2: VERB "flies" from 1 to 2
NP2: NP NOUN2 by rule TF-NP1 from 1 to 2
VP2: VP VERB2 by rule G-VP1 from 1 to 2
S1: S VERB1 NP2 by rule G-S2 from ARC2 from 0 to 2
S2: S NP1 VP2 by rule G-S1 from ARC1 from 0 to 2
ARC4: S NP2 • VP by rule G-S1 from 1 to 2
ARC5: S VERB2 • NP by rule G-S2 from 1 to 2
ARC6: VP VERB2 • PP by rule G-VP2 from 1 to 2

Full parses:
(S (NP (NOUN "time")) (VP (VERB "flies")))
(S (VERB "time") (NP (NOUN "flies")))

*sentence*: it *current-position*: 1
*constituents*:
PRON2: PRON "it" from 0 to 1
NP1: NP PRON2 by rule G-NP2 from 0 to 1
*active-arcs*:
ARC1: S NP1 • VP by rule G-S1 from 0 to 1

*sentence*: it flies *current-position*: 2
NOUN1: NOUN "flies" from 1 to 2
VERB1: VERB "flies" from 1 to 2
NP2: NP NOUN1 by rule G-NP1 from 1 to 2
VP1: VP VERB1 by rule G-VP1 from 1 to 2
S1: S NP1 VP1 from ARC1 from 0 to 2
ARC2: S NP2 • VP by rule G-S1 from 1 to 2
ARC3: VP VERB1 • PP by rule G-VP2 from 1 to 2
ARC4: S VERB1 • NP by rule G-S2 from 1 to 2

*sentence*: it flies like *current-position*: 3
NOUN2: NOUN "like" from 2 to 3
CONJ2: CONJ "like" from 2 to 3
PREP1: PREP "like" from 2 to 3
VERB2: VERB "like" from 2 to 3
NP3: NP NOUN2 by rule G-NP1 from 2 to 3
S2: S VERB1 NP3 from ARC4 from 1 to 3
VP2: VP VERB2 by rule G-VP1 from 2 to 3
S3: S NP2 VP2 from ARC2 from 1 to 3
ARC5: S NP3 • VP by rule G-S1 from 2 to 3
ARC6: PP PREP1 • NP by rule G-PP1 from 2 to 3
ARC7: VP VERB2 • PP by rule G-VP2 from 2 to 3
ARC8: S VERB2 • NP by rule G-S2 from 2 to 3

*sentence*: it flies like an *current-position*: 4
DET1: DET "an" from 3 to 4
ARC9: NP DET1 • NOUN by rule G-NP3 from 3 to 4

*sentence*: it flies like an arrow *current-position*: 5
NOUN3: NOUN "arrow" from 4 to 5
NP5: NP NOUN3 by rule G-NP1 from 4 to 5
NP4: NP DET1 NOUN3 from ARC9 from 3 to 5
PP1: PP PREP1 NP4 from ARC6 from 2 to 5
S4: S VERB2 NP4 from ARC8 from 2 to 5
VP3: VP VERB1 PP1 from ARC3 from 1 to 5
S5: S NP1 VP3 from ARC1 from 0 to 5
ARC10: S NP5 • VP by rule G-S1 from 4 to 5
ARC11: S NP4 • VP by rule G-S1 from 3 to 5

Full parse:
 (S (NP (PRON "it")) (VP (VERB "flies") (PP (PREP "like") (NP (DET "an") (NOUN "arrow"))))))

5. Machine Learning (11 marks)

1. (6 marks) Describe the information gain calculations that are used in ID3 to to decide which attribute to "split" on.

Answer: For solution, see lecture notes. Here's a concrete example, from question 3 of the Machine Learning Exercises.

The attributes and their possible values are:
texture: smooth wavy rough
temperature: cold cool warm hot
size: small medium large

 texture temperature size class smooth cold large yes smooth cold small no smooth cool large yes smooth cool small yes smooth hot small yes wavy cold medium no wavy hot large yes rough cold large no rough cool large yes rough hot small no rough warm medium yes

There are 7 positive examples and 4 negative ones overall, so the overall entropy is –(7/11)×log2(7/11) – (4/11)×log2(4/11) which is about 0.94566.

In the exam, you will not have a calculator. You will not, of course, be expected to work out the logarithms by hand. :-) You should therefore leave the entropy as an unevaluated expression: –(7/11)×log2(7/11) – (4/11)×log2(4/11).)

There are 3 values for the texture attribute: smooth, wavy and rough.

For smooth, there are 4 positive examples and 1 negative, so the entropy is -(4/5)×log2(4/5) - (1/5)×log2(1/5) which is about 0.7219.

(Again, in the exam, you would leave this as an unevaluated expression.)
For wavy there is one positive and one negative example, so the entropy is -(1/2)×log2(1/2) - (1/2)×log2(1/2) = 1.

(I would expect you to know that log2(1/2) = –1, and so evaluate this expression.)

For rough there are two positive and two negative examples, so the entropy is -(1/2)×log2(1/2) - (1/2)×log2(1/2) = 1.

Expected information for a split on texture is thus:

0.7219×(5/11) + 1×(2/11) + 1×(4/11) = 0.8736

Expected information gain for a split on texture is thus 0.94566 - 0.8736 = 0.07206.

2. (5 marks)

1. State the delta rule for backpropagation learning and indicate the significance of each symbol used in its statement

Answer: Δwji = η × δj × yi
where Δwji is the weight change from unit i to unit j, η is the learning rate, δj is the local gradient (whose definition is complex and goes beyond what is required for this question), and yi
is the input signal to node j.

2. Draw a diagram of a 3-layer feedforward neural network, with 3 neurons in each layer, labelling the layers appropriately and indicating the connections between neurons.