In José Rolim et al, editors, Parallel and Distributed Processing, Fourth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'99), pp 108-122, LNCS 1586, 1999.
Parallel programming languages like Fortran 90, Nesl, and approaches based on BMF support operations that manipulate aggregate data structures as a whole. These operations are usually implemented by using a library of parallel operations based on an implementation-specific representation of the aggregate structures. The library routines are often high-level, and thus, simplify the compiler and encapsulate machine-dependencies. However, such an approach complicates optimizations aiming at good use of processor caches and at minimizing communication on distributed-memory machines. We propose a different implementation technique for aggregate data structures that allows compiler optimizations on all levels of the memory hierarchy. By breaking the abstraction enforced by a library, optimizations across multiple operations become possible and data distribution gets more flexible. We outline an intermediate language that distinguishes local and distributed data structures and separates local from global operations. This language allows to implement the mentioned optimizations by program transformation - as a result, the optimizations are easy to comprehend, the compiler can be well structured, optimizations are easily re-ordered, and the optimizations can be proved correct individually. Finally, we report on first results obtained by applying our approach to the implementation of nested data parallelism on distributed-memory machines.
PostScript version (15 pages).
This page is part of Manuel Chakravarty's WWW-stuff.