Thesis Topics Available for 2003/2004

For more details contact me by phone or arrange a consultation time.

A dialogue-style search engine for the Internet.

Imagine an ideal library assistant: The library user asks for documents on, say, vehicles. Then, instead of providing the user with 423,372 documents presented in a ranked list, the library assistant may engage in a dialogue to narrow down the number of documents that should be recommended. The following is a sample dialogue:
 

Assistant: 'What kind of vehicles do you have in mind? - we have literature on trucks, cars, semi-trailers, boats, tanks, etc.'.
User: 'I am interested in passenger cars.'
Assistant: 'What do you want to know about passenger cars?'
User: 'I want to know how to change a flat tyre.'
Assistant: We have a few books on mechanical repairs of cars. Do you want a list of those books?

A search engine that allows limited interaction by grouping web pages according to key words is www.northernlight.com. This thesis aims at improving the quality of groupings and chosen keywords for describing a group of documents. In order to support the user by dynamically extracting useful keywords from existing web pages this thesis project will explore techniques to automatically extract characteristic keywords that can be used to support the user in searching on the web. Available search engines, such as Altavista and Google will be used as a source of links to web documents.

Prerequisites: Good programming skills, preferably also  COMP3411 Artificial Intelligence

Group size: 1-2


Learning Word Relationships from large Amounts of Text

Programs that can process natural language are becoming of increasing interest with the large amounts of natural language available in electronic form. In order to build high quality natural language processing (NLP) systems it is necessary to tailor them towards the specific vocabulary used in a given domain. In order to support the automatic analysis of large quantities of text finding relevant relationships among terms is important. This thesis topic requires to build a learning tool that searches for co-occurrences of terms in text (largely to be downloaded from the Internet). Meaningful interpretation of co-occurrences of term pairs etc. is critical. I.e. the context in which terms occur need to be considered. Here are a few concrete examples of the tasks to be solved by the developed learning software: - Determine all terms which refer to parts of a 'personal computer'. e.g. harddisk, CD-ROM drive, keyboard, etc. - Determine all terms that are more specific than 'computer'. E.g. laptop, PC, super-computer, etc. This thesis project involves substantial programming - preferably in C or C++ for speed, but JAVA is also possible. This topic is challenging and has the potential to lead to a scientific publication. Commencement of work as soon as possible is encouraged.
 

Prerequisites: Good programming skills, COMP3411 Artificial Intelligence

Group size: 1-2


Building Decision Trees using Look Ahead

Learning Decision Trees is one of the most important techniques in Machine Learning. Most decision tree learning algorithms grow the tree step by step. I.e. a recursive algorithm is used that decides for a given set of training examples, whether they all should be classified by a single leaf node or whether the set should be broken up into two or more subsets, whereby each subset is assigned to one child node of the current node in the tree. To make this decision usually only the possible subsets that can be created in one step (one level of the decision tree) are considered. There have been a few studies on whether better trees can be obtained by looking ahead several levels. So far, those studies did not reveal any significant benefit from looking ahead. This thesis is about revisiting these results and to conducting more detailed studies. It requires good C programming skills. The thesis could be based on existing software which can either be used as a model to develop your own learning program from scratch or you modify the existing software directly.

Prerequisites: Good programming skills,  COMP3411 Artificial Intelligence

Group size: 1


Advanced Interactive Document Reader

This project is about developing a reader for Portable Document Format files and/or Postscript files. (Programs are available to convert one format into the other.)
The new document reader should allow to read documents and should also go beyond the features commonly available in document readers such as Ghostview or Acrobat Reader. It should be possible to search for words and phrases, where a search word may also be matched against synonyms as taken from a suitable dictionary. Ideally, the reader should have a `virtual hypertext facility'. I.e. whenever the user wants to find out more about a certain term or phrase  the document reader searches the entire document and displays the relevant information either in the same or an additional window. For example, the document reader displays definitions of technical terms and other relevant information by the click of a button. This goes towards eliminating the need of manually inserting hyperlinks into a document by the author. It is rather the document reader which creates hyperlinks on the fly.
The project can and should build upon existing software such as the ghostview package which is available in source code (in C).

Prerequisites: Good programming skills, preferably also  COMP3411 Artificial Intelligence

Group size: 1-2