| ![]() |
- 528 -
Abstract
Accesses to the shared memory remain to be a major
performance limitation in shared memory multiprocessors.
Scalable multiprocessors with distributed memory also
poses the problem of keeping the memory coherent.
A large number of shared memory coherence mechanisms
has been proposed to solve this problem. Their relative
performance is, however, determined by the sharing
behaviour of the workloads.
This paper presents a methodology to capture and visualise
the sharing behaviour of a parallel program with
respect to the choice coherence mechanisms. We identify
four conceptual workload parameters: Spatial granularity,
Degree of sharing, Access mode, and the Temporal Granularity.To
demonstrate the effectiveness of the methodology, we
have analysed the sharing behaviour of two parallel applications.
The result is used to judge what shared memory
coherence mechanism is most appropriate.
1.0 Introduction
The shared memory paradigm of programming parallel applications for multiprocessors and other environments has emerged as the preferred paradigm over others, such as the message passing model. Several small-scale, bus-based, shared memory multiprocessors are now commercially available and there are numerous research projects going on with large-scale shared memory multiprocessors, e.g. [1, 15]. In addition, there are also a number of ongoing projects with aim to put a shared memory environment on distributed memory machines or to use a network of workstations as a shared memory multicomputer [16] In order to make shared memory multiprocessors scalable, focus has recently been on distributed shared memory where the memory is distributed among the processing nodes. Figure 1 shows a general view of such an architecture. A processor, together with some memory form a processing node. A number of processing nodes are interconnected with an interconnection network. The local memory of a processing node is directly accessible by the
processor. The contents in the memory of the other processing
nodes is accessible either directly, or through some software
mechanism. In both cases, at substantially higher cost
than for the local memory.
The distribution of memory across the processing nodes
makes it very important to exploit the locality of reference
of the parallel programs in order to minimise the average
access time to shared memory. One approach to automate
this has been to use memory coherence mechanisms so that
shared variables may be replicated or automatically
migrated between the processing nodes.
Several memory coherence mechanisms have been proposed
in the literature [8, 14, 20]. They encompass both
cache coherence maintenance and virtual page level management.
The trend in these mechanisms are to make some
assumptions on program behaviour in order to reduce complexity
and implementation cost while still retaining good
performance. The various memory coherence mechanisms
have usually been evaluated by test implementations and
using either synthetic workloads, or some anticipated representative
set of programs. This is very time consuming and
costly if a large number of mechanisms are to be compared.
Instead of repetitively evaluate several coherence mechanisms
based on intuitive reasoning, which is the usual
case, a thorough understanding of the sharing behaviour
can guide the designer in choosing a simple, but still efficient,
coherence mechanism.
In this paper we define some concepts for reasoning
about sharing behaviour in relation to the performance of
shared memory coherence mechanisms. We have found that
there are four critical parameters for describing the sharing
LM
. . .
NETWORK
Figure 1. The distributed shared memory multiprocessor.
Processor
Local memory
P
LMP LMP LMP
Visualising Sharing Behaviour in relation to Shared Memory Management
Mats Brorsson and Per Stenstr?m
Department of Computer Engineering, Lund University
P.O. Box 118, S-221 00 LUND, Sweden
Proceedings of 1992 International Conference
on Parallel and Distributed Systems
Dec. 16-18, 1992, Hsinchu, Taiwan, R.O.C.
pages 528 - 536