Copyright © Bill Wilson, 1998 - 2022 | Contact Info |
Other related dictionaries:
The Prolog Dictionary -
URL: http://www.cse.unsw.edu.au/~billw/prologdict.html
The Artificial Intelligence Dictionary -
URL: http://www.cse.unsw.edu.au/~billw/aidict.html
The Machine Learning Dictionary -
URL: http://www.cse.unsw.edu.au/~billw/mldict.html
The URL of this NLP Dictionary is: http://www.cse.unsw.edu.au/~billw/dictionaries/nlpdict.html
You should use The NLP Dictionary to clarify or revise concepts that
you have already met. The NLP Dictionary is not a suitable
way to begin to learn about NLP.
Further information on NLP can be found in the class web page
lecture notes section.
Other places to find out about artificial intelligence
include the AAAI (American Association for Artificial Intelligence)
AI Reference Shelf
If you wish to suggest an item or items that should be included, or if you found an item that you felt was unclear, please let me know: (my contact info).
This dictionary is mainly limited to the NLP concepts covered or mentioned in COMP9414 Artificial Intelligence at the University of New South Wales, Sydney.
The symbol → is used to separated the type from the list of found constituents, and a dot is used to separate the list of found constituents from the list of types of constituents not yet found.
Example:
Here ARC2 is the name, NP is the type, DET1 ADJ1 is the list of found constituents, NOUN is the (single item) list of constituents not yet found, and the from and to positions are 0 and 2. This active arc would derive from the grammar rule NP → DET ADJ NOUN which says that an NP may be an DETerminer followed by an ADJective followed by a NOUN.
Contrast mood, tense,
and aspect.
Adjectives are also used as the complements
of sentences with
verbs like "be" and "seem" - "He is happy", "He seems drunk".
ADJ is a lexical grammatical category.
The longer ADJPs are most often found as complements of verbs such as
"be" and "seem".
ADJP is a phrasal grammatical category.
Many adverbs end with the morpheme -ly, which
converts an adjective X into an adverb meaning
something like "in an X manner" - thus "bravely" = "in a brave manner".
Other adverbs include intensifiers like
"very" and "extremely".
There are also adverbs of time (like "today", "tomorrow", "then" - as in
"I gave him the book then"), frequency
("never", "often"), and place ("here", "there", and "everywhere").
ADV is a lexical grammatical category.
Adverbial Phrase is a phrasal grammatical
category. Adverbial phrase is usually abbreviated to ADVP.
See also
context-free grammar, and
context-sensitive grammar.
There can be situations where more than one of these is present.
For a fairly complete and quite entertaining treatment of anaphora,
see Hirst, G. Anaphora in Natural Language Understanding: A Survey
Springer Lecture Notes in Computer Science 119, Berlin: Springer, 1981.
In Prolog, we would write something like:
Augmented grammar rules are also used to record sem and var features in
computing logical forms, and to express the relationship between
the sem and var of the left-hand side and the sem(s) and var(s) of
the right-hand side.
For example, for the rule vp → v (i.e. an intransitive verb), the
augmented rule with sem feature could be:
vp(sem(lambda(X, ?semv(?varv, X))), var(?varv)) →
where subcat none indicates that this only works
with an intransitive verb.
Complex groupings of auxiliaries can occur, as in "The child may
have been being taken to the movies".
Some auxiliaries
(do, be, and have)
can also occur as verbs in their own right.
Auxiliary verb is often abbreviated to AUX.
AUX is a lexical grammatical category.
Pr(A | B) = Pr(B | A) × Pr(A) / Pr(B)
The chart parser described in lectures is a bottom-up parser, and
can parse sentences, using any context-free grammar, in cubic time:
i.e., in time proportional to the cube of the number of words in the
sentence.
Notice that in "The pizza was eaten by Mary", "the pizza" becomes
the syntactic subject, whereas it was the syntactic object in
the equivalent sentence "Mary ate the pizza".
With semantic case, which is the primary sense in which we
are concerned with the term case in COMP9414, the focus
is on the meaning-relationship between the verb and the noun
or noun phrase. Since this does not change between "Mary ate
the pizza" and "The pizza was eaten by Mary", we want to use
the same syntactic case for "the pizza" in both sentences.
The term used for the semantic case of "the pizza" is
theme. Similarly, the semantic case of
"Mary" in both versions of the sentence is
agent. Other cases frequently used include
instrument,
coagent,
experiencer,
at-loc, from-loc, and to-loc,
at-poss, from-poss, and to-poss,
at-value, from-value, and to-value,
at-time, from-time, and to-time, and
beneficiary.
Semantic cases are also referred to as thematic roles.
See also chart parsing.
That algorithm will now be summarized:
to parse a sentence S using a grammar G and lexicon L:
to check if an active arc can have its dot advanced
Example: For the active arc ARC2: NP → ART1 . ADJ N from 2 to 3
if there is a constituent ADJ2: ADJ → "green" from 3 to 4 (so that the
to position, 3, and the type, ADJ, of the constituent of
the active arc immediately after the dot, match the from position,
3, and the type, ADJ, of the constituent ADJ2) then the active
arc ARC2 can be extended, i.e. have its dot advanced, creating a new
active arc, say ARC3: NP → ART1 ADJ2 . N from 2 to 4.
Named after the linguist Noam Chomsky.
There is a wide variety of complement structures. Some are
illustrated in the entry for subcategorization.
An example of an adjective with a complement is
"thirsty for blood", as in "The football crowd was thirsty for blood
after the home team was defeated." This is a PP-complement. Another
would be "keen to get out of the stadium", a TO-INF complement,
as in "The away-team supporters were keen to get out of the stadium."
The semantic system described in COMP9414 assumes
compositional semantics.
There are also subordinate conjunctions, like "if" and "when",
as in "I will play with you if you will lend me your marbles"
and "I will lend you this book when you return the last one
you borrowed".
Conjunctions may also be used to join nouns,
adjectives, adverbs, verbs, phrases ...
Conjunction is often abbreviated to CONJ.
CONJ is a lexical grammatical category.
See active arc. When an
active arc is completed (when all its sub-constituents are found),
the active arc becomes a constituent.
Constituents are used to create new active arcs - when there is
a constituent X1 of type X, and a grammar
rule whose right hand side starts with the grammar symbol X,
then a new active arc of type X may be created, with the constituent
X1 listed as a found constituent for the active arc (the only one,
so far).
The components of a constituent, as recorded in the
chart parsing algorithm described in lectures,
are:
Context-sensitive grammars are more powerful than context-free grammars,
but they are much harder to work with.
See also statistical NLP.
D
See also here.
Sometimes discourse entities have a more complex relation to
the text. For example, in "Three boys each bought a pizza",
clearly "Three boys" gives rise to a DE that is a set of three
objects of type boy (B1: |B1| = 3 and B1 subset_of
{x|Boy(x)}), but "a pizza", in this context, gives rise to a
representation of a set P1 of three pizzas (whereas in the usual
case "a pizza" would give rise to a DE representing a single
pizza.)
Bitransitive verbs can appear with just one or even no syntactic
objects ("I gave two dollars", "I gave at the office") - their distinguishing
characteristic is that they can have two objects, unlike
intransitive and
transitive verbs.
Here is an incomplete list of ditransitive verbs in English.
Ellipsis causes problems for NLP since it is necessary to infer
the rest of the sentence from the context.
"ellipsis" is also the name of the symbol "..." used when
something is omitted from a piece of text, as in
"Parts of speech include nouns, verbs, adjectives, adverbs,
determiners, ... - the list goes on and on." "elliptical" is the adjectival form of "ellipsis".
See also anaphor.
would be read as "for some entity X, X likes spinach" or just
"something likes spinach". This might be too broad a statement,
as it could be satisfied, for example, by a snail X that liked spinach.
It is common
therefore to restrict the proposition to something like:
i.e. "Some person likes icecream." That is, we are restricting the type
of X to persons. In some cases, it is more reasonable
to abbreviate the type restriction as follows:
See also forall, Skolem functions
and this riddle.
"I" and "we" are first-person pronouns,
as are "me", "us". Other words with the first-person feature include
"mine", "my", "myself", "ours", "our", and "ourselves".
would be read as "for every entity X, X likes icecream" or just
"everything likes icecream". This would be too broad a statement,
as it would allege that, for example, rocks like icecream. It is usual
therefore to restrict the proposition to something like:
i.e. "Every person likes icecream." That is, we are restricting the type
of X to persons. In some cases, it is more reasonable
to abbreviate the type restriction as follows:
See also exists.
See also Chomsky hierarchy.
Using this model, the probability of generating "dogcatchers catch
old red fish" can be calculated as follows: first work out the probability
of the lexical category sequence
Here is a list of 300+
ill-formed sentences.
See also here for the distinction between "infinite"
and "infinitive".
Some words inflect regularly, and some inflect irregularly, like the plural
form "children" of "child", and the past tense and past participle forms
"broke" and "broken" of the verb "break".
INTERJ is a lexical grammatical category.
It usually appears as a single word utterance, indicating some strong
emotion or reaction to something. Examples include: "Oh!", "Ouch!",
"No!", "Hurray!" and a range of blasphemies and obscenities, starting
with "Damn!".
Prolog code for lambda-reduction is:
"pig": N V ADJ.
"pig" is familiar as a N, but also occurs as a verb
("Jane pigged herself on pizza") and an adjective, in the phrase "pig iron",
for example.
Contrast with phrasal category.
Variables in logical form language, unlike in FOPC, persist beyond the
"scope" of the quantifier. E.g. A man came in. He went to the table.
The first sentence introduces a new object of type man1. The He,
in the second sentence refers to this object.
NL quantifiers are typically restricted in the range of objects that the variable
ranges over. In Most dogs bark the variable in the most1 quantifier
is restricted to dog1 objects: most1(d1 : dog1(d1), barks1(d1)).
With tenses, we use the modal operators pres, past, fut, as in:
pres(sees1)(john1, fido1))
Thus if likes1(jack1, sue1) is a formula in the logical form language, then
we can construct logical forms like know(mary1, likes1(jack1, sue1))
meaning that Mary knows that Jack likes Sue. Similarly for
believe(mary1, likes1(jack1, sue1)) and
want(marg1, own(marg1, (?obj : &(porsche1(?obj), fire_engine_red(?obj)))))
- that's Marg wants to own a fire-engine red Porsche.
See also failure of substitutivity.
In fact, noun modifier is a synonym for nominal.
Noun is often abbreviated to N.
N is a lexical grammatical category.
In some languages other than English, there may be different
distinctions drawn - some languages distinguish between one, two,
and many, rather than just one and many as in English.
Nouns in English are mostly marked for number - see
plural.
Pronouns and certain
determiners may also be marked for number.
For example, "this" is singular, but "these" is plural,
and "he" is singular, while "they" is plural.
See also ditransitive,
transitive,
and intransitive.
If so, it produces as output some kind of representation
of the way (or ways*) in which the sentence can be derived
from the grammar and lexicon. A common way of doing this
is to output (a) parse tree(s).
"Parsing" means executing a parser.
* if the sentence is ambiguous - that is, if it can be derived
from the grammar and lexicon in more than one way, then multiple
parse trees will be produced. See also
ambiguity.
Participles are used in constructing tensed, progressive,
and passive forms of verbs,
as in "he had hired [her]", "she was hiring [them]", and "you are hired",
and also as though they were
adjectives in phrases like "a flying horse" and "a hired man".
In some cases, present participles have become accepted as nouns
representing an instance of the action that the underlying verb
describes, as with "meeting".
PRESPART and PASTPART are
lexical grammatical categories.
See also phrasal verb.
Below is a table of the forms of pronouns, etc. in English, classified by
person and syntactic case:
Contrast with lexical category.
See also phrasal categories.
It is also possible for a sentence to be well-formed at the
lexical, syntactic, and semantic levels, but ill-formed at the
pragmatic level because it is inappropriate or inexplicable
in context. For example, "Try to hit the person next to you
as hard as you can" would be pragmatically ill-formed in almost
every conceivable situation in a lecture on natural language
processing, except in quotes as an example like this. (It might
however, be quite appropriate in some settings at a martial
arts lesson.)
On general context-free grammars,
a vanilla predictive parser takes exponential parsing time (i.e. it can
be very very slow). See also bottom-up
parsers.
break1(e1, agent[name(j1, 'John')] theme[pro(i1, it1)] instr(the<h1, hammer1>])
s(P1, P3, Agr) :- np(P1, P2, Agr), vp(P2, P3, Agr).
Actually, this is too tough - the agr feature of a VP, in particular,
is usually fairly ambiguous - for example the verb "love" (and
so any VP of which it is the main verb) has agr=[1s,2s,1p,2p,3p],
and we would want it to agree with the NP "we" which has agr=[1p].
This can be achieved by computing the intersection of the agr
of the NP and the VP and setting the agr of the S to be this
intersection, provided it is non-empty. If it is empty, then the
S goal should not succeed.
s(P1, P3, SAgr) :-
np(P1, P2, NPAgr),
vp(P1, P2, VPAgr),
intersection(NPAgr, VPAgr, SAgr),
nonempty(SAgr).
where intersection computes
the intersection of two lists (regarded as sets) and binds the
third argument to this intersection, and nonempty succeeds
if its argument is not the empty list.
v(subcat(none), sem(?semv), var(?varv))
Auxiliary Example do/does/did I did read have/has/had/having He has read be/am/are/is/was/were/been/being He is reading shall/will/should/would He should read can, could She can read may, might, must She might read
The resulting new active arc will be:
ARCy: C → C[1] ... C[j+1] .
C[j+2] ... C[n]
from m to n
where y is a natural number that has not yet been used in an
arc-name.
Examples:
nouns Boys and girls [come out to play]. adjectives [The team colours are] black and yellow.
adverbs [He was] well and truly [beaten]. verbs [Mary] played and won [her match]. phrases across the river and into the trees
[She] fell down and hit her head.
.
component example NP1: NP → ART1 ADJ1 N1 from 0 to 3 name NP1 usually formed from the type + a number type NP a phrasal or lexical category of the grammar decomposition ART1 ADJ1 N1
(ART1, ADJ1 and N1 would be the names of other constituents
already found) from 0 sentence position of the left end of this NP to 3 sentence position of the right end of this NP
P A set of grammar rules or productions, that is,
items of the form X → a, where X is
a member of the set N, that is, a non-terminal symbol, and
a is a string
over the alphabet A.
An example would be the rule NP → ART ADJ N which signifies that
a Noun Phrase can be an ARTicle followed by an ADJective followed
by a Noun, or N → horse, which signifies that horse
is a Noun.
NP, ART, ADJ, and N are all non-terminal symbols, and horse
is a terminal symbol.A the alphabet of the grammar, equal to the disjoint union
of N and T N the set of non-terminal symbols (i.e. grammatical or
phrasal categories) T the set of terminal symbols (i.e. words of the language
that the grammar defines) S a distinguished non-terminal, normally interpreted as
representing a full sentence (or program, in the case of a programming
language grammar)
E
S ⇒ NP VP rule 1 ⇒ ART N VP rule 2 ⇒ the N VP rule 4 ⇒ the cat VP rule 5 ⇒ the cat V rule 3 ⇒ the cat miaowed rule 6
P1 = {p | pizza(p) and exists(b) : Boy(b) and
y = pizza_bought_by(b)}.
The function "pizza_bought_by" is the
Skolem function referred to in lectures as "sk4".
G
H
type masculine feminine neuter example pronoun (nominative) he she it he hit the ball. pronoun (accusative) him her it Frank hit him. pronoun(possessive adjective) his her its Frank hit his arm. pronoun(possessive) his hers its The ball is his. pronoun(reflexive) himself herself itself Frank hurt himself.
I
For example, var is a head feature for a range of phrasal categories,
including S. This means that an S gets its var feature by copying the
var feature of its head subconstituent, namely its VP.
Head features are discussed on page 94-96 of Allen.
J
K
L
Mlambda_reduce(lambda(X, Predicate), Argument, Predicate) :-
X = Argument.
Applying this to an actual example:
: lambda_reduce(
lambda(X, eats(e1, X, the1(p1, pizza1))),
name(m1, 'Mary'),
Result) ?
X = name(m1, 'Mary')
Result = eats(e1, name(m1, 'Mary'), the1(p1, pizza1))
past(sees1)(john1, fido1)
fut(sees1)(john1, fido1)
N
The tense operators include fut, pres, and past, representing future,
present and past. For example, fut(likes1(jack1, sue1)) would represent
Jack will like Sue.
Mood Description Example indicative A plain statement John eats the pizza imperative A command Eat the pizza! WH-question A question with a phrasal answer,
often starting with a question-word
beginning with "wh"Who is eating the pizza?
What is John eating?
What is John doing to the pizza?Y/N-question A question with yes/no answer Did John eat the pizza? subjunctive An embedded sentence that is
counter-factual but must be expressed
to, e.g. explain a possible consequence.If John were to eat more pizza
he would be sick.
O
Constrast
verb,
adjective,
adverb,
preposition,
conjunction, and
interjection.
adjectives,
nominal modifiers (i.e. other nouns, acting as though
they were adjectives),
certain kinds of adverbs that modify the adjectives,
as with "very" in "very bright lights",
participles functioning as adjectives (as in "hired man" and "firing squad"),
cardinals,
ordinals,
determiners,
and quantifiers.
There are constraints on the way these ingredients can be put together.
Here are some examples of noun phrases:
Ships (as in Ships are expensive to build,
three ships (cardinal + noun),
all three ships (quantifier + cardinal + noun),
the ships (determiner + noun),
enemy ships (nominal + noun),
large, grey ships (adjective + adjective + noun),
the first three ships (determiner + ordinal + cardinal + noun),
my ships (possessive + noun).
P
s(np(pro("He")),
vp(v("ate"),
np(art("the"), n("pizza")))).
case first person second person third person nominative I/we thou/you/ye he/she/it/they accusative me/us thee/you/ye him/her/it/them possessive adjective mine/ours thine/yours his/hers/its/theirs possessive my/our thy/your his/her/its/their reflexive myself/ourselves thyself/yourself
yourselveshimself/herself
itself/themselves
take in deceive He was taken in by the swindler take in help, esp. with housing
The homeless refugees were taken in
by the Sisters of Mercytake up accept They took up the offer of help. take off remove She took off her hat.
person & number possessive
adjectivepossessive
pronounfirst person singular my mine first person plural our ours second person singular thy thine second person (modern) your yours third person singular his/her/its his/hers/its third person plural their theirs