
| Date | Change |
| 19 May | Minor correction: sem of "fed" changed to "fed1". The sample outputs are not affected. |
| 22 May | This is not a change but a remark. nlpbasis.pro contains lexicon entries for adjectives, and some word facts for "the red car". However, this spec does not mention adjectives, and your program does not have to (should not) handle adjectives. In fact, the adjective stuff is left over from another version of this assignment in an earlier year. I have added some comments to nlpbasis.pro to clarify this. |
This assignment is to be completed by all COMP9414 and COMP9814 students. COMP9814 students should also do some extra work.
This specification is reasonably long. Do not panic, much of it is
examples of code and execution! We provide the two files nlptemplate.pro and nlpbasis.pro that you should
modify. Download them both and put them in the same directory.
Rename nlptemplate.pro as nlp-soln.pl, because
you will be converting it into the solution you will hand in. Note
we will test your submitted nlp-soln.pl with our own, different,
nlpbasis.pro. Also note that the examples that appear
below are provided in nlpbasis.pro.
Make sure you put the right things in the right places! In
particular, to avoid warnings from the Prolog interpreter,
you'll need to put all the facts and rules for any given predicate
together - e.g. all the lexical entries for nouns need to be
together, and all the rules for noun phrases need to be together.
Note you'll probably need to make use of the =..
meta-predicate, which
converts a list into a term - e.g. it can convert
[likes, mary, pizza] into likes(mary, pizza).
It is an infix predicate, that is, it appears between the
arguments, just as the infix predicate "<" appears
between its arguments in e.g. X < 3.
You can find details on =.. in the textbook
(Bratko, p155) and/or by asking SWI Prolog the query
help(=..)., or by going to
the
Prolog dictionary.
You should only use =.. if there is no other way of doing things.
A typical case is where the name of the
functor
of the term that you are trying to build is bound to a variable.
If this is not the situation, then you can build a term the way we did
it in the syntax analysis code that we studied in lectures: e.g.
s(P1,P3,s(NP,VP)) :-
np(P1,P2,NP), vp(P2,P3,VP).
If NP is bound to, say,
np(name(John)),
and VP is bound to
vp(v(eats), np(det(the), noun(pizza))),
then this rule will build the term
s(np(name(John)), vp(v(eats), np(det(the), noun(pizza))))
in the head of the rule. =.. is not needed for this, and
should not be used.
In this assignment, you are to modify the Prolog code for parsing simple sentences (described in lectures, and listed below) to perform semantic analysis as well as syntactic analysis:
Here's the
original parsing code from the lecture notes (these clauses
belong in your nlp-soln.pl after you have modified them):
% grammar rules:
s(P1,P3,s(SynNP,SynVP)) :-
np(P1,P2,SynNP),
vp(P2,P3,SynVP).
vp(P1,P3,vp(SynVerb,SynNP)) :-
v(P1,P2,SynVerb),
np(P2,P3,SynNP).
np(P1,P2,np(SynName)) :-
proper(P1,P2,SynName).
np(P1,P3,np(SynDet,SynNoun)) :-
det(P1,P2,SynDet),
noun(P2,P3,SynNoun).
% lexicon entries:
isname(john).
isverb(feeds).
isdet(the).
isnoun(numbat).
% rules to build lexical constituents:
det(From, To, det(Word)) :-
word(Word, From, To),
isdet(Word).
noun(From, To, noun(Word)) :-
word(Word, From, To),
isnoun(Word).
v(From, To, v(Word)) :-
word(Word, From, To),
isverb(Word).
proper(From, To, name(Word)) :-
word(Word, From, To),
isname(Word).
|
We will need to expand the lexicon entries as well as the grammar rules, to provide the sem features. We will do this without using the sem keyword - instead the second argument to the lexicon entry will be the semantic information.
isname(john, 'John'). isverb(feeds, feeds1). isdet(the, the1). isnoun(numbat, numbat1).
This will in turn necessitate changes to any code that uses the lexicon entry, for example the rules that build lexical constituents, e.g.
det(From, To, det(Word), Sem) :-
word(Word, From, To),
isdet(Word, Sem).
The other thing we need is a mechanism to create var features for constituents. For lexical constituents, the easy way to do this is to attach a var to each word, like this:
word(john, 1, 2, j1). word(feeds, 2, 3, e1). word(the, 3, 4, t1). word(numbat, 4, 5, c1).
Note that such clauses belong in nlpbasis.pro.
Now we need to modify the rules that build lexical constituents again, e.g.
det(From, To, det(Word), Sem, Var) :-
word(Word, From, To, Var),
isdet(Word, Sem).
We are coding the var feature positionally, as we did with sem (and with the syntactic analysis). It's up to you to modify the rules that build phrasal constituents (like NP, VP, and S) so that they pass around the var features appropriately. Your other job, in fact your main job, is to construct and pass around the sem features for the phrasal constituents. You will do this by modifying the grammar rules.
One point that you might find tricky is λ-reduction, although in fact λ-reduction turns out to be implementable as a single simple rule, plus a "bypass" rule. To give you a start, here are the rules:
% lambda_reduce(+Expression, +Argument, -Result):
% if Expression starts with lambda, unify the lambda
% variable X in Expression with Argument. Result is the
% Predicate from the Expression, post-unification.
% The main part of the lambda_reduce predicate can be expressed as:
% lambda_reduce(lambda(X, Predicate), Argument, Predicate) :-
% X = Argument.
% This says that if the first argument is a lambda-expression,
% then unify X with Argument, and then the third argument (Predicate)
% gives you the result of lambda-reduction. However, while this is
% easier for humans to understand, it has an unnecessary unification.
% The rule below does the same thing without the extra unification:
lambda_reduce(lambda(Argument, Predicate), Argument, Predicate).
% If the Expression is not a lambda-expression, the Result
% is Expression, unchanged.
lambda_reduce(Expression, _, Expression) :-
Expression \= lambda(_, _).
|
In other words, the condition on the right of the neck forces the Argument to unify with the λ-variable X, and then the copy of the Predicate that is the final parameter of lambda_reduce allows you to retrieve the Result of the λ-reduction.
As an example of the final performance of your system, you should be able to
handle all of the phrasal constituent types as exemplified below (note that
nlpbasis.pro has word/4 definitions for "John feeds
the numbat" but not for the lexicon - that's left to you):
% prolog -s nlp-soln.pl [...] ?- np(1,2,Syn,Sem,Var). Syn = np(name(john)), Sem = name(j1, 'John'), Var = j1 ; ← user presses ";" false. ?- np(3,5,Syn,Sem,Var). Syn = np(det(the), noun(numbat)), Sem = the1(n1, numbat1), Var = n1 ; false. ?- vp(2,5,Syn,Sem,Var). Syn = vp(v(feeds), np(det(the), noun(numbat))), Sem = lambda(_G248, feed1(f1, _G248, the1(n1, numbat1))), Var = f1 ; false. ?- s(1,5,Syn,Sem,Var). Syn = s(np(name(john)), vp(v(feeds), np(det(the), noun(numbat)))), Sem = assert(feed1(f1, name(j1, 'John'), the1(n1, numbat1))), Var = f1 ; false. ?- control-D %
The exact identity of the variable in the lambda-expression (_G248 in the example) may vary - all that matters is that it is a variable.
Note that you need to include the speech act operator
assert in the logical form, but you don't have to worry
about the tense operator (past / pres / fut).
Note that each grammar rule must be implemented by a single Prolog rule,
and that grammar rules that imply a lambda-reduction must use
lambda_reduce to do it.
The parts of speech to be covered are: aux det pro proper noun v (where pro signifies pronoun). You must use these names for the parts of speech. The grammar rules to be covered are:
You must implement the rule from the lecture notes:
Note that this a recursive rule that can handle sequences of auxiliaries. You are not required to filter out invalid auxiliary sequences generated by this rule: "have can fed" is OK, just like "can have fed", for this assignment.
For example, with
word(can, 10, 11, c10). word(feed, 11, 12, f10). word(the, 12, 13, t10). word(numbat, 13, 14, n10).
?- vp(10, 14, Syn, Sem, Var). Syn = vp(aux(can), vp(v(feed), np(det(the), noun(numbat)))), Sem = lambda(_G293, can1(feed1(f10, _G293, the1(n10, numbat1)))), Var = f10
The exact identity of the variable _G293 is unimportant.
For the vp "can have fed the numbat", the analysis would be:
?- vp(21,26,Syn,Sem,Var). Syn = vp(aux(can), vp(aux(have), vp(v(fed), np(det(the), noun(numbat))))), Sem = lambda(_G286, can1(have1(fed1(f2, _G286, the1(n2, numbat1))))), Var = f2
The exact identity of the variable _G286 is unimportant. This
example also assumes that the var feature of the particular instance
of the word "numbat" in this VP is n2, and that the sem features of
"have" and "fed" are have1 and fed1 respectively.
word(she, 20, 21, s1):
?- np(20,21,Syn,Sem,Var). Syn = np(pro(she)), Sem = pro(s1, she1), Var = s1 ; false.
ynqs(sem(yn_query(?semvp ?semnp)))A more complete version of the rule would use the aux to help determine the tense, but we won't worry about that in the assignment. Notice also that there are plenty of types of yes-no questions that are not covered by this rule - for example Is John a lawyer? Further, we are not worrying about parsing the question mark - this will simply be left out of our word sequence. An example is the sentence Does John feed the numbat?:
→ aux(sem(?semaux)) np(sem(?semnp)) vp(sem(?semvp))
word(does, 100, 101, d100). word(john, 101, 102, j100). word(feed, 102, 103, f100). word(the, 103, 104, t100) word(numbat, 104, 105, n100). ?- ynqs(100, 105, Syn, Sem, Var). Syn = ynqs(aux(does), np(name(john)), vp(v(feed), np(det(the), noun(numbat)))), Sem = yn_query(feed1(f100, name(j100, 'John'), the1(n100, numbat1))), Var = f100 ; false.
vp(sem(lambda(X, ?semv(?varv, X, ?semnp2, recipient(?semnp1)))))For example, with the vp gives John the book, with words and lexical entries:
→ v(sem(?semv)) np(sem(?semnp1)) np(sem(?semnp2))
word(gives,82,83,g1). isverb(gives, give1). word(john,83,84,j2). isname(john, 'John'). word(the,84,85,t2). isdet(the, the1). word(book,85,86,b1). isnoun(book, book1).the analysis should be like this:
?- vp(82, 86, Syn, Sem, Var). Syn = vp(v(gives), np(name(john)), np(det(the), noun(book))), Sem = lambda(_G294, give1(g1, _G294, the1(b1, book1), recipient(name(j2, 'John')))), Var = g1 .
The exact identity of the variable _G298 is unimportant.
It is probably advisable to start work just with the four basic rules and associated parts of speech; later you can refine your program to add the extra parts of speech and rules.
When your completed assignment is tested by the marking system,
the lexicon entries and words used will not be those in the
version of nlpbasis.pro distributed with this
specification. The test data will use the same format as in the
distributed version of nlpbasis.pro. Thus you must not
modify the format of the lexicon entries and words!
You do not have to cope with things like past(feed1).
In order to get your rules for building lexical constituents right, you need to have the right format for the lexicon entries: here are example lexicon entries for each of the six required lexical categories:
isaux(can, can1). isdet(the, the1). isname(jane, 'Jane'). isnoun(numbat, numbat1). ispro(he, he1). isverb(see, see1).
You are likely to need to have at least one word from every lexical
category that your grammar covers in your test lexicon in
nlpbasis.pro - if your lexicon does not contain any
pronoun, for example, and you try to parse something, you may get
an error message saying something like:
ERROR: Undefined procedure: ispro/2even though you may have no pronoun in your test sentence.
The most common problem is a "singleton variable" warning, i.e. when a variable occurs just once in a clause. This is sometimes intentional, and sometimes a typographic error. To suppress these warnings simply prefix an underscore "_" to the singleton variable.
Tip: To make it easier for you to tell when
you have warnings, try loading up your code using the following variant
of the prolog command:
prolog -q -s nlp-soln.plThe
-q "flag" stands for "quiet", and it causes the Prolog
system to omit the welcome, copyright, and help information. This makes the
warnings and error messages, if any, much more obvious.
The actual allocation of marks may differ slightly from what is shown below.
| Criterion | Marks | Remarks |
| Testing | 8 | Make sure your code works for all the sample inputs. Then create a completely different set of inputs, with associated word facts and lexicon entries, and make sure it works for those, too. When testing with the sample inputs, copy-paste the inputs to make sure you get them exactly right. |
| Readability & Style | 2 | Things to check include (and this is not a complete list): |
First run through the compliance check:
nlpbasis.pro into your nlp-soln.pl ?
:- ['nlpbasis.pro']. This line is in
nlptemplate.pro, so it will be in your solution unless you
deleted it. It should be on a line by itself.
prolog -q -s nlp-soln.pl ?
If there are warning or error messages, fix your code so that the
warning or error messages do not occur.
nlp-soln.pl using the UNIX command:
give cs9414 nlp nlp-soln.pl
This goes for both COMP 9414 and COMP 9814 students. Carefully check your submission.
Due date: 11.30pm on Friday of week 13 (Friday 1 June, 2012).
Don't forget to check that your code works on a CSE machine, in exactly the form that you will be handing it in, immediately before submission. Even if you just add an extra line of comments, re-test before submission! The work you submit must be your own, except where you acknowledge another source. |
Copyright © UNSW & Bill Wilson, 2012.
Bill Wilson's contact info
UNSW's CRICOS Provider No. is 00098G