COMP9414 / COMP9814 2012s1 NLP Assignment

cartoon


Changes to Specification

Remember to check this section for any updates to the specification.
DateChange
19 MayMinor correction: sem of "fed" changed to "fed1". The sample outputs are not affected.
22 MayThis is not a change but a remark. nlpbasis.pro contains lexicon entries for adjectives, and some word facts for "the red car". However, this spec does not mention adjectives, and your program does not have to (should not) handle adjectives. In fact, the adjective stuff is left over from another version of this assignment in an earlier year. I have added some comments to nlpbasis.pro to clarify this.

Introduction

This assignment is to be completed by all COMP9414 and COMP9814 students. COMP9814 students should also do some extra work.

This specification is reasonably long. Do not panic, much of it is examples of code and execution! We provide the two files nlptemplate.pro and nlpbasis.pro that you should modify. Download them both and put them in the same directory. Rename nlptemplate.pro as nlp-soln.pl, because you will be converting it into the solution you will hand in. Note we will test your submitted nlp-soln.pl with our own, different, nlpbasis.pro. Also note that the examples that appear below are provided in nlpbasis.pro. Make sure you put the right things in the right places! In particular, to avoid warnings from the Prolog interpreter, you'll need to put all the facts and rules for any given predicate together - e.g. all the lexical entries for nouns need to be together, and all the rules for noun phrases need to be together.


=.. (pronounced "univ")

Note you'll probably need to make use of the =.. meta-predicate, which converts a list into a term - e.g. it can convert [likes, mary, pizza] into likes(mary, pizza). It is an infix predicate, that is, it appears between the arguments, just as the infix predicate "<" appears between its arguments in e.g. X < 3. You can find details on =.. in the textbook (Bratko, p155) and/or by asking SWI Prolog the query help(=..)., or by going to the Prolog dictionary.

You should only use =.. if there is no other way of doing things. A typical case is where the name of the functor of the term that you are trying to build is bound to a variable.
If this is not the situation, then you can build a term the way we did it in the syntax analysis code that we studied in lectures: e.g.

     s(P1,P3,s(NP,VP)) :- 
            np(P1,P2,NP), vp(P2,P3,VP).
If NP is bound to, say, np(name(John)), and VP is bound to vp(v(eats), np(det(the), noun(pizza))), then this rule will build the term s(np(name(John)), vp(v(eats), np(det(the), noun(pizza)))) in the head of the rule. =.. is not needed for this, and should not be used.


Aim

In this assignment, you are to modify the Prolog code for parsing simple sentences (described in lectures, and listed below) to perform semantic analysis as well as syntactic analysis:

Here's the original parsing code from the lecture notes (these clauses belong in your nlp-soln.pl after you have modified them):

% grammar rules:
s(P1,P3,s(SynNP,SynVP)) :-
	np(P1,P2,SynNP),
	vp(P2,P3,SynVP).
vp(P1,P3,vp(SynVerb,SynNP)) :-
	v(P1,P2,SynVerb),
	np(P2,P3,SynNP).
np(P1,P2,np(SynName)) :-
	proper(P1,P2,SynName).
np(P1,P3,np(SynDet,SynNoun)) :-
	det(P1,P2,SynDet),
        noun(P2,P3,SynNoun).

% lexicon entries:
isname(john).
isverb(feeds).
isdet(the).
isnoun(numbat).

% rules to build lexical constituents:
det(From, To, det(Word)) :-
	word(Word, From, To),
	isdet(Word).
noun(From, To, noun(Word)) :-
	word(Word, From, To),
	isnoun(Word).
v(From, To, v(Word)) :-
	word(Word, From, To),
	isverb(Word).
proper(From, To, name(Word)) :-
	word(Word, From, To),
	isname(Word).


Expanding Lexicon Entries

We will need to expand the lexicon entries as well as the grammar rules, to provide the sem features. We will do this without using the sem keyword - instead the second argument to the lexicon entry will be the semantic information.

isname(john, 'John').
isverb(feeds, feeds1).
isdet(the, the1).
isnoun(numbat, numbat1).

This will in turn necessitate changes to any code that uses the lexicon entry, for example the rules that build lexical constituents, e.g.

det(From, To, det(Word), Sem) :-
     word(Word, From, To),
     isdet(Word, Sem).

Var features

The other thing we need is a mechanism to create var features for constituents. For lexical constituents, the easy way to do this is to attach a var to each word, like this:

word(john, 1, 2, j1).
word(feeds, 2, 3, e1).
word(the, 3, 4, t1).
word(numbat, 4, 5, c1).

Note that such clauses belong in nlpbasis.pro.


Building Lexical Constituents

Now we need to modify the rules that build lexical constituents again, e.g.

det(From, To, det(Word), Sem, Var) :-
     word(Word, From, To, Var),
     isdet(Word, Sem).

We are coding the var feature positionally, as we did with sem (and with the syntactic analysis). It's up to you to modify the rules that build phrasal constituents (like NP, VP, and S) so that they pass around the var features appropriately. Your other job, in fact your main job, is to construct and pass around the sem features for the phrasal constituents. You will do this by modifying the grammar rules.


λ-Reduction

One point that you might find tricky is λ-reduction, although in fact λ-reduction turns out to be implementable as a single simple rule, plus a "bypass" rule. To give you a start, here are the rules:

% lambda_reduce(+Expression, +Argument, -Result):
% if Expression starts with lambda, unify the lambda
% variable X in Expression with Argument. Result is the
% Predicate from the Expression, post-unification.
% The main part of the lambda_reduce predicate can be expressed as:
% lambda_reduce(lambda(X, Predicate), Argument, Predicate) :-
%     X = Argument.
% This says that if the first argument is a lambda-expression,
% then unify X with Argument, and then the third argument (Predicate)
% gives you the result of lambda-reduction. However, while this is
% easier for humans to understand, it has an unnecessary unification.
% The rule below does the same thing without the extra unification:
lambda_reduce(lambda(Argument, Predicate), Argument, Predicate).
% If the Expression is not a lambda-expression, the Result
% is Expression, unchanged.
lambda_reduce(Expression, _, Expression) :-
     Expression \= lambda(_, _).

In other words, the condition on the right of the neck forces the Argument to unify with the λ-variable X, and then the copy of the Predicate that is the final parameter of lambda_reduce allows you to retrieve the Result of the λ-reduction.


Example Query No. 1

As an example of the final performance of your system, you should be able to handle all of the phrasal constituent types as exemplified below (note that nlpbasis.pro has word/4 definitions for "John feeds the numbat" but not for the lexicon - that's left to you):

% prolog -s nlp-soln.pl
[...]

?- np(1,2,Syn,Sem,Var).

Syn = np(name(john)),
Sem = name(j1, 'John'),
Var = j1 ;               ← user presses ";"
false.

?-  np(3,5,Syn,Sem,Var).

Syn = np(det(the), noun(numbat)),
Sem = the1(n1, numbat1),
Var = n1 ;
false.

?- vp(2,5,Syn,Sem,Var).

Syn = vp(v(feeds), np(det(the), noun(numbat))),
Sem = lambda(_G248, feed1(f1, _G248, the1(n1, numbat1))),
Var = f1 ;
false.

?- s(1,5,Syn,Sem,Var).

Syn = s(np(name(john)), vp(v(feeds), np(det(the), noun(numbat)))),
Sem = assert(feed1(f1, name(j1, 'John'), the1(n1, numbat1))),
Var = f1 ;
false.

?- control-D
%

The exact identity of the variable in the lambda-expression (_G248 in the example) may vary - all that matters is that it is a variable.

Note that you need to include the speech act operator assert in the logical form, but you don't have to worry about the tense operator (past / pres / fut).


Required coverage 0

Note that each grammar rule must be implemented by a single Prolog rule, and that grammar rules that imply a lambda-reduction must use lambda_reduce to do it.


Required coverage 1: the original 4 rules

The parts of speech to be covered are: aux det pro proper noun v (where pro signifies pronoun). You must use these names for the parts of speech. The grammar rules to be covered are:


Required coverage 2: vp → aux vp


Required coverage 3: np → pro


Required coverage 4: ynqs → aux np vp


Required Coverage 5: vp → v np np


Where to Start

It is probably advisable to start work just with the four basic rules and associated parts of speech; later you can refine your program to add the extra parts of speech and rules.

When your completed assignment is tested by the marking system, the lexicon entries and words used will not be those in the version of nlpbasis.pro distributed with this specification. The test data will use the same format as in the distributed version of nlpbasis.pro. Thus you must not modify the format of the lexicon entries and words!

You do not have to cope with things like past(feed1).


Format of lexicon entries

In order to get your rules for building lexical constituents right, you need to have the right format for the lexicon entries: here are example lexicon entries for each of the six required lexical categories:

isaux(can, can1).
isdet(the, the1).
isname(jane, 'Jane').
isnoun(numbat, numbat1).
ispro(he, he1).
isverb(see, see1).

You are likely to need to have at least one word from every lexical category that your grammar covers in your test lexicon in nlpbasis.pro - if your lexicon does not contain any pronoun, for example, and you try to parse something, you may get an error message saying something like:

ERROR: Undefined procedure: ispro/2
even though you may have no pronoun in your test sentence.


Removing Warnings

The most common problem is a "singleton variable" warning, i.e. when a variable occurs just once in a clause. This is sometimes intentional, and sometimes a typographic error. To suppress these warnings simply prefix an underscore "_" to the singleton variable.

Tip: To make it easier for you to tell when you have warnings, try loading up your code using the following variant of the prolog command:

prolog -q -s nlp-soln.pl
The -q "flag" stands for "quiet", and it causes the Prolog system to omit the welcome, copyright, and help information. This makes the warnings and error messages, if any, much more obvious.


Marking Criteria

The actual allocation of marks may differ slightly from what is shown below.

CriterionMarksRemarks
Testing 8 Make sure your code works for all the sample inputs. Then create a completely different set of inputs, with associated word facts and lexicon entries, and make sure it works for those, too. When testing with the sample inputs, copy-paste the inputs to make sure you get them exactly right.
Readability
& Style
2 Things to check include (and this is not a complete list):

How to Submit

First run through the compliance check:

Submit your nlp-soln.pl using the UNIX command:

give cs9414 nlp nlp-soln.pl

This goes for both COMP 9414 and COMP 9814 students. Carefully check your submission.


Due date: 11.30pm on Friday of week 13 (Friday 1 June, 2012).

Don't forget to check that your code works on a CSE machine, in exactly the form that you will be handing it in, immediately before submission. Even if you just add an extra line of comments, re-test before submission!

The work you submit must be your own, except where you acknowledge another source.


Copyright © UNSW & Bill Wilson, 2012.
Bill Wilson's contact info

UNSW's CRICOS Provider No. is 00098G