Next: 2.7.5 Hidden Markov Models
Up: 2.7 Learning tools
Previous: 2.7.3 Instance-based Learning
Although not at the moment a mainstream learning
technique, grammar-based learning algorithms may also be applied,
although they are less general than other learning techniques.
A complete review of grammars is beyond the scope of this thesis, but
a basic introduction will be given here.
A grammar consists of four things:
- A set of terminals,
- where a terminal is the most primitive
symbol in the grammar. In English, for example, words would be
considered terminals.
- A set of non-terminals,
- where non-terminals are symbols which
are, as the name indicates, symbols which are not terminals.
Typically, in English, this would be things like sentences, noun
phrases, adverbial phrases, subjects and so on.
- A set of production rules,
- where a production rule is a
transformation from one sequence of terminals and/or non-terminals to
another sequence of terminals and non-terminals. The left hand side
must have at least one non-terminal in it. An example of a
production would be that if you have a noun_phrase (a
non-terminal symbol) then it can be replaced with
peartridge
(which is a
terminal symbol). There can be more than one production for a given
right-hand side.
- A Start symbol,
- where a start symbol shows where all valid
sequences of symbols we want to be able to produce can be derived
from. For example, in English, the Start symbol might be
sentence, from which we could derive some valid
sentences, by following the production rules.
Given a stream of terminals, it is possible to take these terminals
and build out of it a ``parse tree'' -- a tree illustrating the steps
taken to get from the start symbol to the sequence of terminals
observed (this activity is of course known as parsing)
. This tree acts as a store of information,
and by applying operations to this tree, meaning can be extracted from
the sequence of apparently random terminal symbols.
Although this sounds complex, it is done on a daily basis by the
compilers used in our computers. Although most people think of
symbols as words, it might be possible to take a broader view and
think of basic sub-gestures as the terminals (with some additional
terminals for movement) of a language which could be the set of Auslan
signs.
The problem with these systems at the moment is that these sort of
grammars are typically hand-built. There is on-going research into
automatic grammar learning programs, but these are still in their
infancy.
Next: 2.7.5 Hidden Markov Models
Up: 2.7 Learning tools
Previous: 2.7.3 Instance-based Learning
waleed@cse.unsw.edu.au