ContentsQuick linksExternal

Harnessing the Multicores with Data Parallel HaskellData Parallel Haskell is a set of program transformations and a selfoptimising, parallel array library integrated into the Glasgow Haskell Compiler, the most widely used Haskell compiler. It extends the broadspectrum, purely functional programming language Haskell with nested data parallelism, targetting multicore CPUs and GPUs. The ProblemComputers are no longer getting faster; instead, we will be offered computers containing more and more CPUs, each of which is no faster than the previous generation. As the number of CPUs increases, it becomes more and more difficult for a programmer to deal with the interactions of large numbers of threads. Moreover, the physical limitations of bus bandwidth will mean that memory access times will be increasingly nonuniform (even if the address space is shared), and locality of reference will be increasingly important. State of the ArtIn the world of massivelyparallel computing with strong locality requirements there is already a wellestablished, demonstrably successful brand leader, namely data parallelism. In a dataparallel computation one performs the same computation on a large collection of differing data values. Wellknown examples of dataparallel programming environments are parallel Fortran dialects, the collective operations of the Message Passing Interface (MPI), OpenMP, NVIDIA's Compute Unified Device Architecture (CUDA) API for graphics processors, and Google's map/reduce framework. Unfortunately, flat data parallelism is rather restrictive for the programmer: subcomputations performed in parallel must all process substructures of the same shape and size. The resulting parallel structure is restricted to rectangular, multidimensional arrays, and it lacks composability and modularity. Nested Data ParallelismNested data parallelism is more expressive; it supports irregular and sparse data structures and algorithms by enabling parallel subcomputations of widely varying shape and size. It is dynamic, composable and modular. The basic idea has been around for a considerable whilie  c.f., Blelloch's programming language NESL. However, it has never been integrated into a generalpurpose language. We are adding nested data parallelism to the broad spectrum, purely functional language Haskell. Haskell is pure restricting sideeffects by default. Haskell's purity is essential to implement nested data parallelism, as computations are radically reordered.Data Parallel HaskellWe express data parallelism in the form of collective operations over nested parallel arrays. Here we define dense vectors as arrays of doubleprecision floating point numbers, sparse vectors as arrays of indexvalue pairs, and sparse matrices as arrays of sparse vectors. type Vector = [: Double :] type SparseVector = [: (Int, Double) :] type SparseMatrix = [: SparseVector :] Given these data types, we can implement a dot product between a sparse and a dense vector using a parallel sum consuming the result of an array comprehension that generates the products by indexing the dense vector with the indices of the sparse vector. sdotp :: SparseVector > Vector > Double sdotp sv v = sumP [: x * (v !: i)  (i, x) < sv :] Finally, we multiply an entire sparse matrix with a dense vector by computing a sparse dot product for every row of the matrix. smvm :: SparseMatrix > Vector > Vector smvm m v = [: sdotp row v  row < m :] This algorithm nests parallelism in an essential way as we may perform all of the sparse dot products in parallel, although those dot products are themselves parallel operations and the size of different sparse rows may we very different. Implementing Data Parallel HaskellOur implementation of Data Parallel Haskell is based on the mature, highlyoptimising Glasgow Haskell Compiler developed at Microsoft Research, Cambridge, U.K., together with international collaborators. At the heart of our approach is a vectorising program transformation and a highperformance, selfoptimising array library, which are described in more detail in the papers Harnessing the Multicores: Nested Data Parallelism in Haskell and Data Parallel Haskell: a status report, respectively. Currently, the array library has only two backends: one is purely sequential and the other is using GHC's multithreading support to parallelise array computations for SMT hardware, specifically for multicore CPUs. However, the architecture is generic and we are planning further array backends in the future, such as a backend targeting GPUs via CUDA and/or OpenCL. Performance
We gathered performance figures for an early implementation of
Data Parallel Haskell for three simple benchmarks: (1)
The graphs illustrate the performance relative to a sequential C program  i.e., as soon a curve climbs over 1 on the yaxis, it's absolute performance is better than that of a sequential C program. The benchmarks were executed on two different hardware architectures, both of which were equipped with 8 cores. The first one has 8 cores with one hardware thread each, in the form of two QuadCore 3GHz Xeon processors. The second one has 8 cores with 8 hardware threads each, in the form of a single 1.4GHz UltraSPARC T2.
The scalability of these three programs in Data Parallel
Haskell is very good, although Further Information
People (alphabetically)

• Copyright 2009 Manuel M T Chakravarty • Last modified: 10 March 2009 • 