About me

I am a PhD student in the Programming Languages and Systems group at UNSW in Sydney, Australia. My supervisors are Ben Lippmeier, Manuel Chakravarty and Gabrielle Keller. I am interested in optimisations for purely functional languages, specifically streaming models and fusion for efficient concurrent execution of multiple queries.

The dangers of greedy fusion

Clusterings for fusing a dataflow graph. The square nodes represent input streams, while round nodes represent scan combinators. Dashed lines are fusion-preventing dependencies. (Middle) shows a greedy clustering with four loops; (Right) shows optimising for minimal loops with three loops.



Folderol: stream fusion for multiple queries or consumers

Folderol is an implementation of Machine fusion for fusing and executing concurrent streaming queries. Combinators in a streaming program are represented as processes in a Kahn process network. We keep the determinism of list and streaming programs, while being able to coordinating between multiple consumers, which is necessary for executing multiple queries. This project uses Template Haskell to construct process networks and generate fused processes at compile-time.

This project is a combination of conceptual work on a novel fusion system and low-level implementation details for generation of efficient code.

Icicle: a streaming query language

When dealing with large data sets that do not fit in memory, it is crucial to limit the number of accesses and iterations over the data set. However, in a high level language it may not be immediately obvious how many iterations a particular program will require. The number of iterations becomes even less obvious in the presence of heuristic and statistics-based optimisations, as used by traditional databases: a small tweak to the query or even modifying the number of rows in a table can cause drastic changes to the query plan.

As data sets continue to grow, a high level language with predictable runtime characteristics becomes more necessary. Programmers should not need to understand the internal workings of a database or query optimiser in order to write fast queries.

At Ambiata, we have designed and implemented Icicle, a query language specifically for single-pass queries. This means that any query in our language is assured to compile into a single iteration over the data set. We use a type system based on temporal logic to ensure that queries can be executed in a single pass without violating causality. Queries are then compiled to a stream-based intermediate language, which allows multiple queries to be merged together, removing duplicate computations. Finally, queries are compiled to high-performance C code.


Finding the best fusion clustering is actually NP-hard (as proved by Alain Darte). By converting combinator programs to integer linear programs, we can generally find good clusterings in adequate time.

Disciplined Disciple Compiler (DDC)

DDC is a research compiler for a strict functional language with effect typing. It isn’t particularly useful yet, but has some interesting optimisations.

Linear Integer/Mixed Programming (LIMP)

A fairly simple Haskell library for expressing linear programs. At the moment, there’s only a simplifier, COIN/CBC bindings, and pretty-printing.

Audio pilot

A game that takes music and lets you fly around a tunnel. Written in Haskell, using OpenGL.

Social media