Thesis Topics Available for 2003/2004
For more details contact me by phone or arrange a consultation time.
A dialogue-style search engine for the Internet.
Imagine an ideal library assistant: The library user asks for documents
on, say, vehicles. Then, instead of providing the user with 423,372 documents
presented in a ranked list, the library assistant may engage in a dialogue
to narrow down the number of documents that should be recommended. The
following is a sample dialogue:
Assistant: 'What kind of vehicles do you have in mind? - we have literature
on trucks, cars, semi-trailers, boats, tanks, etc.'.
User: 'I am interested in passenger cars.'
Assistant: 'What do you want to know about passenger cars?'
User: 'I want to know how to change a flat tyre.'
Assistant: We have a few books on mechanical repairs of cars. Do you
want a list of those books?
A search engine that allows limited interaction by grouping web pages
according to key words is www.northernlight.com.
This thesis aims at improving the quality of groupings and chosen keywords
for describing a group of documents. In order to support the user by dynamically
extracting useful keywords from existing web pages this thesis project
will explore techniques to automatically extract characteristic keywords
that can be used to support the user in searching on the web. Available
search engines, such as Altavista and Google will be used as a source of
links to web documents.
Prerequisites: Good programming skills, preferably also
COMP3411 Artificial Intelligence
Group size: 1-2
Learning Word Relationships from large Amounts of Text
Programs that can process natural language are becoming of increasing interest
with the large amounts of natural language available in electronic form.
In order to build high quality natural language processing (NLP) systems
it is necessary to tailor them towards the specific vocabulary used in
a given domain. In order to support the automatic analysis of large quantities
of text finding relevant relationships among terms is important. This thesis
topic requires to build a learning tool that searches for co-occurrences
of terms in text (largely to be downloaded from the Internet). Meaningful
interpretation of co-occurrences of term pairs etc. is critical. I.e. the
context in which terms occur need to be considered. Here are a few concrete
examples of the tasks to be solved by the developed learning software:
- Determine all terms which refer to parts of a 'personal computer'. e.g.
harddisk, CD-ROM drive, keyboard, etc. - Determine all terms that are more
specific than 'computer'. E.g. laptop, PC, super-computer, etc. This thesis
project involves substantial programming - preferably in C or C++ for speed,
but JAVA is also possible. This topic is challenging and has the potential
to lead to a scientific publication. Commencement of work as soon as possible
is encouraged.
Prerequisites: Good programming skills, COMP3411 Artificial Intelligence
Group size: 1-2
Building Decision Trees using Look Ahead
Learning Decision Trees is one of the most important techniques in Machine
Learning. Most decision tree learning algorithms grow the tree step by
step. I.e. a recursive algorithm is used that decides for a given set of
training examples, whether they all should be classified by a single leaf
node or whether the set should be broken up into two or more subsets, whereby
each subset is assigned to one child node of the current node in the tree.
To make this decision usually only the possible subsets that can be created
in one step (one level of the decision tree) are considered. There have
been a few studies on whether better trees can be obtained by looking ahead
several levels. So far, those studies did not reveal any significant benefit
from looking ahead. This thesis is about revisiting these results and to
conducting more detailed studies. It requires good C programming skills.
The thesis could be based on existing software which can either be used
as a model to develop your own learning program from scratch or you modify
the existing software directly.
Prerequisites: Good programming skills, COMP3411 Artificial
Intelligence
Group size: 1
Advanced Interactive Document Reader
This project is about developing a reader for Portable Document Format
files and/or Postscript files. (Programs are available to convert one format
into the other.)
The new document reader should allow to read documents and should also
go beyond the features commonly available in document readers such as Ghostview
or Acrobat Reader. It should be possible to search for words and phrases,
where a search word may also be matched against synonyms as taken from
a suitable dictionary. Ideally, the reader should have a `virtual hypertext
facility'. I.e. whenever the user wants to find out more about a certain
term or phrase the document reader searches the entire document and
displays the relevant information either in the same or an additional window.
For example, the document reader displays definitions of technical terms
and other relevant information by the click of a button. This goes towards
eliminating the need of manually inserting hyperlinks into a document by
the author. It is rather the document reader which creates hyperlinks on
the fly.
The project can and should build upon existing software such as the
ghostview package which is available in source code (in C).
Prerequisites: Good programming skills, preferably also
COMP3411 Artificial Intelligence
Group size: 1-2