| ![]() |
Automatic Spark Strategies and Granularity
for a Parallel Functional Language Reducer
Kevin Hammond, Jim S. Mattson Jr. and Simon L. Peyton Jones ?
University of Glasgow, Glasgow, UK
Abstract. This paper considers the issue of dynamic task control in the context of a parallel Haskell implementation on the GRIP multiprocessor. For the first time, we report the effect of our task control strategies on task granularity, as measured in terms of dynamic heap allocations. This gives a concrete means of measuring the effectiveness of these strategies other than wall-clock timings, which are notoriously uninformative.
1 Introduction
Compared with most conventional approaches to parallel programming, it is comparatively easy to write parallel functional programs. Deadlock is impossible. Because programs are deterministic, they may be debugged on sequential hardware. Programs can be highly scalable, running without change on different processor configurations or even architectures. However this flexibility places a high load on the functional compiler and runtime system. Our approach is to exploit parallelism which is notionally fairly fine-grained, but which can be dynamically combined to form much larger grained programs. Fine granularity allows greater flexibility when programming, and has better worst-case scheduling properties than coarse granularity [3], but carries much greater overhead on an architecture built from conventional CPUs. We use a state-of-the-art sequential functional language compiler, the Glasgow Haskell Compiler [16], modified with support for parallel language primitives and a sophisticated runtime system.
Previous research has considered problems of determining appropriate throttling and scheduling strategies for particular applications [7, 8]. In this paper we study the effect of two simple spark control strategies (global" and local" sparking) on performance and task granularity.
1.1 GRIP
GRIP is a MIMD parallel machine, designed to execute parallel functional programs [15]. Logically, the architecture comprises a number of processing elements (PEs) and intelligent memory units (IMUs) connected by a fast network,
? This work is supported by a Royal Society of Edinburgh Research Fellowship, and the SERC AQUA Project. Authors' address: Computing Science Dept, Glasgow University, 17 Lilybank Gdns., Glasgow, Scotland. Email: [email protected]