A Compiler Toolkit in Haskell

Functional languages like Haskell are ideal for implementing compilers. Complemented with a lexer and parser generator, they provide much of the functionality and ease-of-use associated with specialized compiler tools (aka compiler compilers) while being more flexible, better supported, and guaranteeing better long time availability.

There is a significant set of functionality that is required in each compiler like symbol table management, input-output operations, error management, and so on, which are good candidates for code reuse. The Compiler Toolkit is an attempt to provide an open collection of modules for these recurring tasks in a popular functional language. The toolkit is used for the interface generator C->Haskell.

Although the toolkit is developed with the Glasgow Haskell Compiler (GHC), a special effort is made to keep the system portable. In fact, all code that is either not Haskell 98 is located in a single module. Furthermore, the code is well commented, additional documentation is provided in a technical report, and the source code of two compilers currently developed on top of the toolkit is available.

A subset of the toolkit, called CTKlight, that essentially provides support for implementing syntactical analysis plus some utilities without the other, more heavy-weight features of the toolkit is available separately. It works with both GHC and Hugs 98.

The latest news:

Functionality

Currently, the toolkit provides functionality in three main areas plus some infrastructure like a sophisticated makefile system and some utility routines.

General Purpose
This includes finite maps, pretty printing, and generation of unique names.
Symbols & Attributes
Datatypes and routines for handling identifiers, attribute tables, and hierarchical name spaces.
State
Monad-based handling of compiler-global state (tracking and formating of error messages, exception handling, management of compiler options, and so on) and phase-local state information (identifier-object associations, error recovery information, and so on).
Syntax
Self-optimizing combinators for lexical and syntactical analysis. The parser combinators are based on ideas from Swierstra & Duponcheel (but don't support error correction yet). The lexer combinators are similar in spirit, as they - at runtime - generate a state transition table to speed up the recognition process, but the applied technique is (as far as I know) original - a paper is available.

In a small compiler project, these two modules can be used without the state handling (to make for a more light-weight system). See Chapter 5 in the documentation and CTKlight.

Of special interest might be the implementation of structure tree attributes, as it is rather different from the "traditional way" of building compilers in functional languages. Instead of being stored directly in the structure tree, the attributes are located in separate attribute tables that are indexed by attribute identifiers contained in each tree node. This minimizes the need to modify the structure tree during attribute evaluation and makes it easy to discard no longer needed attributes.

Usability

In its current state, the Compiler Toolkit already provides a significant amount of core functionality, which is reasonably well tested. While it is still consider to be a beta and has not undergone much performance debugging, it is already expected it to be helpful to somebody starting a new compiler effort.

Software

To build the code, you need a reasonably recent version of the Glasgow Haskell Compiler.

CTKlight

The following releases of the Compiler Toolkit Light are available:

Copyleft and Feedback

The code is covered by the GPL (GNU General Public License); CTKlight is available under the LGPL (GNU Library General Public License). Bug reports, comments, suggestions, and code contributions should be directed to chak@cse.unsw.edu.au

Acknowledgements

Sven Panne authored the module GetOpt.hs for processing command-line arguments. Gabriele Keller, Roman Lechtchinsky, Wolf Pfannenstiel, and Martin Simons provided feedback in the context of the Nepal compiler project. Simon L. Peyton Jones suggested a number of valuable improvements to the self-optimising lexer library.


This page is part of Manuel Chakravarty's WWW-stuff.

Last modified: Wed May 18 14:50:56 EST 2005