Introduction to Natural Language Processing
Reference: Allen, chapter 2
| Aim: |
|
To review the grammar of English, introducing some terms for describing different
types of English phrases, and the concept of a grammar rule. We also have a
quick look at how the different levels of linguistic knowledge interact.
|
| Keywords:
abstract noun,
active voice,
ADJ,
adjective,
adjective phrase,
ADJP,
ADV,
adverb,
adverbial phrase,
ADVP,
agreement,
apposition,
article,
aspect,
AUX,
auxiliary verb,
BELIEVE,
bitransitive,
bound morpheme,
cardinal,
case,
common noun,
concrete noun,
CONJ,
conjunction,
count noun,
declarative,
demonstrative,
descriptive grammar,
determiner,
ellipsis,
embedded sentence,
features in NLP,
first person,
free morpheme,
FUT,
future perfect,
gender,
grammar,
imperative,
indicative,
infinitive,
inflection,
intensifier,
INTERJ,
interjection,
intransitive,
lexeme,
mass noun,
morpheme,
morphology,
N,
nominal,
noun,
noun modifier,
noun phrase,
NP,
number (grammatical),
object,
ordinal,
participle,
particle,
passive voice,
PAST,
past perfect,
person,
phone,
phoneme,
phonetics,
phonology,
phrasal verb,
phrase,
pluperfect,
plural noun,
possessive,
PP,
PP attachment,
pragmatics,
predicate,
PREP,
preposition,
prepositional phrase,
PRES,
prescriptive grammar,
present perfect,
progressive,
proper noun,
proposition,
qualifier,
quantifier,
quantifying determiner,
relative clause,
S,
second person,
sentence,
simple future,
simple past,
simple present,
singular noun,
speech act,
string,
subject,
subjunctive,
syntax,
tense,
third person,
transitive,
V,
verb,
verb complement,
verb group,
verb phrase,
VP,
wh-question,
word,
y/n question
|
NLP Intro Plan
| Plan: |
- overview of linguistics. Our focus: lexicon, morphology, syntax, semantics, reference
We will only look at the simple basics of English syntax.
- parts of speech and refinements (e.g. mass and count nouns)
- phrase types (e.g. noun phrase and verb phrase)
- inflection: -ing, -ed, -est
- grammar rules: NP → DET ADJ N
- tense, aspect, active/passive, transitivity
- complement structure for verbs and adjectives
|
* note: in past versions of these notes, ART (article) has been used instead of DET (determiner). If any instances of ART have been missed in updating the notes, treat them as though they were DET for the purposes of this course.
Topics in NLP
| | Introduction
- applications
- linguistics overview
- ambiguity
- example systems
| | Syntax
- linguistic knowledge
- grammars and parsing
|
| | Semantics
- logical form
- semantic interpretation
| | Pragmatics
- reference
- discourse processing
|
Typical Applications
- database query languages
- machine (assisted) translation
weather reports, Canadian Hansard
- information extraction
e.g. financial reports, news article summaries
- all of these operate in limited domains
Related Disciplines
| Linguistics
| - study of language and of languages |
| Psycholinguistics
| - language and the mind, models of human language processing |
| Neurolinguistics
| - neural-level models of language processing |
| Logic
| - an unambiguous formal language useful for representing (unambiguous) meanings |
Map of the Discipline of Linguistics
SR = Speech Recognition; NLP = Natural Language Processing

Australian-English Phonemes
Major Issue - Ambiguity
Structural or Syntactic Ambiguity
The man saw the boy on the hill with the telescope.
Word-Sense Ambiguity
dog:
- domesticated carnivore Canis familiaris
- animals related to 1.
- the male of such an animal (as opposed to bitch)
- various unrelated but vaguely similar animals (e.g. prairie dog)
- a despicable fellow
- a fellow in general (you old dog, you)
- a mechanical device for gripping or holding (dog the hatches)
- an andiron
and more.
Referential Ambiguity
Steven hated Paul. He hit him.
he = Steven, or he = Paul ?
Eliza/Doctor (1966)
Men are all alike.
In what way?
They're always bugging us about something.
Can you think of a specific example?
Well, my boyfriend made me come here.
Your boyfriend made you come here?
He says I'm depressed much of the time.
I am sorry to hear you are depressed.
It's true - I am unhappy.
Do you think coming here will help you not to be unhappy?
I need some help.
...
Earlier you said you were unhappy.
Eliza 2
Eliza ...
- Was an early rule-based system.
- Matched patterns (
I like $Object) in the input,
discarded inessential bits, turned what was left around
(I like pizza → you like pizza), and added an
"attitude" (I'm sorry to hear you like pizza).
- Has a memory, so material can be re-introduced some time later.
My favourite exchange from Eliza:
...
My sister is going out with a fish.
Are any other members of your family going out with a fish?
Syntax
| Reference: | Allen, Chapter 2 |
| Aim: |
To review linguistic knowledge, introducing some terms for describing different types of English phrases, and the concept of a grammar rule.
|
| Plan: |
- words; types of words
- phrase types
- grammar rules: NP → DET ADJ NOUN
- inflection: -ing, -ed, -est
- tense, aspect, active/passive, transitivity
- complement structure for verbs and adjectives
|
Words and Word Types
- let words be our atomic units of language
- words can be classified:
- nouns = objects, concepts: cat, George, justice
- verbs = actions, states, attitudes: buy, believe
- adjectives = object properties: big, red
- adverbs = event properties: quickly
- determiners: a, the, this, both, all, some
- prepositions: of, in, before, under
- conjunctions: and, or, if, than
- interjections, etc.: arggh, bother, s**t
- The first four are the open classes, the rest are
closed classes. Neologisms normally belong to the open
classes.
Subclassification
- Some of these can be subclassified
-
| proper nouns | abstract nouns | mass nouns | count nouns |
Iraq | philosophy | sand | apple
|
-
| intransitive | transitive | ditransitive verbs |
| laugh | kill | give |
| you laugh | you kill an ant* | you give him a book |
* NB: this is bad karma, and the staff of COMP9414 will not be held responsible if you
take such a course of action
Phrase Types
- subject and predicate
The cat ate the mouse
- usually called NP and VP (Noun Phrase and Verb Phrase) in NLP
- VPs often contain an NP (like the mouse, above)
- prepositional phrases (PP)
with a cheese sauce | for dinner | of the United States
- a PP usually consists of a preposition followed by an NP
- exception: 's phrases: Samuel's = NP followed by 's
- some languages use post-positions instead of prepositions, e.g. Japanese
- sentence (S)
- can be embedded: I believe the cat ate the mouse
Grammar
- A grammar is a formal description of the structure of a language
(usually as spoken or written by native speakers/writers)
- This notion of grammar is also called descriptive grammar
- Prescriptive grammar means rules for a high-status variant of
a language e.g. 'don't split infinitives', 'say "isn't", not "ain't"'
We are not interested here in prescriptive grammar.
Sentence Forms
| declarative (indicative) | John is listening |
| yes/no question (interrogative) | Is John listening? |
| wh-question (interrogative) | When is John listening? |
| imperative | Listen, John! |
| subjunctive | If John were listening, he might
hear something to his advantage |
The subjunctive mood often describes a counter-factual situation
- that is, it describes a situation that is not a fact
- in our example of the subjunctive form, John is not listening.
Noun Phrases (NPs)
Grammar Rules
- The structure of a phrase can be described in terms of its
sequence of parts of speech.
- One type of noun phrase:
NP → DET ADJ NOUN
DET = determiner; ADJ= adjective.
Read this as "a Noun Phrase can be an DETerminer followed by an
ADJective followed by a NOUN." E.g. a vicious dog
There are many other NP structures and hence other NP rules. Here are two more:
NP → DET NOUN
NP → QUANT CARD NOUN
e.g. all three tutors
Grammar Rules 2
- Abbreviated Notation: we can combine our three NP rules using the | symbol
(= "or" in this context):
NP → DET NOUN | DET ADJ NOUN | QUANT CARD NOUN
- With NP defined (and PREP = preposition)
we can define prepositional phrase (PP):
PP → PREP NP | NP 's
- Over-Generation: Sometimes such rules generate strings of words which are not valid,
like NP → QUANT CARD NOUN ⇒ *both seven horse.
- Some Rules for VP (verb) and S (sentence):
VP → V | V NP | V NP NP
i.e. A VP can be just a Verb, or a Verb followed by an NP,
or a Verb followed by two NPs.
S → NP VP | AUX NP VP "?" | WH AUX NP VP "?"
e.g.
| Rule | Example |
| S → NP VP | Time flies |
| S → AUX NP VP "?" | Did you go? |
| S → WH AUX NP VP "?" | When did you go? |
Verb Groups (VG)
- simplest sentences: Subject + Verb Group
| Subject | Verb Group |
| Harry | understands |
|
|
| will have been | punished |
| auxiliaries | head |
|
- So Verb Group = (possibly) auxiliary verbs + head verb
- AUX = auxiliary verb (or just "auxiliary")
Examples: do does did can could was is have been being shall will should could
Many auxiliaries can also be used as verbs in their own right: I did my homework
Inflections, Tense
The head verb is (may be) inflected:
| INF | PRES PART | PRES TENSE | PAST TENSE | PAST PART | |
| eat | eating | eats | ate | eaten | IRREGULAR (STRONG) much change |
| set | setting | sets | set | set | IRREGULAR (STRONG) little change
|
| - | - | can | could | - | IRREGULAR (MISSING FORMS) |
| be | being | am/is/are | was/were | been | IRREGULAR (VERB "to be")
|
| kill | killing | kills | killed | killed | REGULAR (WEAK) |
INF = INFINITIVE
PRES = PRESENT
PART = PARTICIPLE
Auxiliaries, Modals, Tense, Aspect
- auxiliary verbs may be
- forms of be, do, and have, or
- modal auxiliaries such as can, could, will, shall, should
- the -ing inflection controls the progressive aspect
- the modal do (and its variants) control emphasis
- a combination of auxiliaries and inflections determine tense
| simple present
| He eats the pizza |
| simple past
| She ate the pizza |
| simple future
| He will eat the pizza |
| present perfect
| She has eaten the pizza |
| future perfect
| He will have eaten the pizza |
| pluperfect
| She had eaten the pizza |
Passive Voice
- transitive and bitransitive verbs can be in passive form:
| Active Form
| Passive Form (if any) |
| Jim eats buns
| Buns are eaten by Jim |
| Dan helps the student
| The student is helped by Dan |
| but Bill glares
| * Is glared by Bill |
An asterisk (*) in front of a linguistic example means the
example is ungrammatical or otherwise unacceptable. ("?"
means the example is doubtful in some way.)
Particles, Negation, Adverbial Phrases
Agreement
- the verb group in a sentence or clause must agree in number with
the subject noun phrase: The dog barks but The dogs bark.
- NB: the agreement is with the main noun in the subject noun phrase, not the
most recent noun: The dog with fleas barks is right;
* The dog with fleas bark is wrong.
- This distinction only exists in the simple present tense or
when the first auxiliary inflects:
- The dog barked etc., but
- The dog has barked/is barking/does bark vs
The dogs have barked/are barking/do bark.
Embedded Sentences (S)
John's giving up the game was cowardly
The man who gave Paul the money was crazy
The money that was given to Paul was lost or
The money given to Paul was lost
Complements
Benedict believes he is the Pope
Margaret wants to own a fire-engine red Porsche
Blake promised that he would never steal a bear again
Adjectives (ADJ) & Adjective Phrases (ADJP)
- Adjectives can be inflected to form comparatives & superlatives:
| uninflected | comparative | superlative |
| good | better | best |
| brave | braver | bravest |
| gracious | more gracious | most gracious |
- Groups of words modifying a noun are termed adjectival phrases
(ADJP):
- the brave tutor
- the noble senior tutor
- the most estimable senior tutor (most is an ADV)
- [He was] hungry for knowledge
- as slow as a FORTRAN lecture on a Spring afternoon
Conjunctions (CONJ)
- words like and, but, if, so, or,
... can connect various kinds of structures together to make more complicated ones:
- He is rich and keeps a secretary to type his AI assignments
- She was poor but she was honest
- I'll have a blue or green one
- I'll eat my hat if Howard apologises to the Stolen Generation
- Granny rocked back and forth in her chair by the fire
- This is totally and outrageously unjustified
Lexical and Phrasal Categories
- Lexical (or preterminal) categories are classes of words.
- So far we've seen N, V, ADJ, ADV, PREP, DET, QUANT, PRO, CONJ.
- Some words may be members of two or more categories, like
damn, which can be a noun, a verb, and adjective, and adverb, or an
interjection (INTERJ).
- Phrasal categories are the remaining categories, and (for English) include
NP, VP, PP, ADVP, ADJP, S.
- Phrasal categories are defined using grammar rules.
| Summary: Outline of English Syntax |
While reviewing English syntax, we have introduced a number
of terms and symbols for describing types of words and phrases in English,
including the lexical categories N, V, ADJ, ADV, CONJ, INTERJ, and PREP,
and phrasal categories NP, VP, PP, ADVP, ADJP, VG, and S.
In passing, we also introduced the concept of grammar rules such as
NP → DET NOUN
|
CRICOS Provider Code No. 00098G
Last updated:
Copyright © Bill Wilson, 2007, except where another source is acknowledged.