- 528 -
Accesses to the shared memory remain to be a major performance limitation in shared memory multiprocessors. Scalable multiprocessors with distributed memory also poses the problem of keeping the memory coherent. A large number of shared memory coherence mechanisms has been proposed to solve this problem. Their relative performance is, however, determined by the sharing behaviour of the workloads.
This paper presents a methodology to capture and visualise the sharing behaviour of a parallel program with respect to the choice coherence mechanisms. We identify four conceptual workload parameters: Spatial granularity, Degree of sharing, Access mode, and the Temporal Granularity.To demonstrate the effectiveness of the methodology, we have analysed the sharing behaviour of two parallel applications. The result is used to judge what shared memory coherence mechanism is most appropriate.
The shared memory paradigm of programming parallel applications for multiprocessors and other environments has emerged as the preferred paradigm over others, such as the message passing model. Several small-scale, bus-based, shared memory multiprocessors are now commercially available and there are numerous research projects going on with large-scale shared memory multiprocessors, e.g. [1, 15]. In addition, there are also a number of ongoing projects with aim to put a shared memory environment on distributed memory machines or to use a network of workstations as a shared memory multicomputer  In order to make shared memory multiprocessors scalable, focus has recently been on distributed shared memory where the memory is distributed among the processing nodes. Figure 1 shows a general view of such an architecture. A processor, together with some memory form a processing node. A number of processing nodes are interconnected with an interconnection network. The local memory of a processing node is directly accessible by the
processor. The contents in the memory of the other processing
nodes is accessible either directly, or through some software
mechanism. In both cases, at substantially higher cost
than for the local memory.
The distribution of memory across the processing nodes makes it very important to exploit the locality of reference of the parallel programs in order to minimise the average access time to shared memory. One approach to automate this has been to use memory coherence mechanisms so that shared variables may be replicated or automatically migrated between the processing nodes.
Several memory coherence mechanisms have been proposed in the literature [8, 14, 20]. They encompass both cache coherence maintenance and virtual page level management. The trend in these mechanisms are to make some assumptions on program behaviour in order to reduce complexity and implementation cost while still retaining good performance. The various memory coherence mechanisms have usually been evaluated by test implementations and using either synthetic workloads, or some anticipated representative set of programs. This is very time consuming and costly if a large number of mechanisms are to be compared. Instead of repetitively evaluate several coherence mechanisms based on intuitive reasoning, which is the usual case, a thorough understanding of the sharing behaviour can guide the designer in choosing a simple, but still efficient, coherence mechanism.
In this paper we define some concepts for reasoning about sharing behaviour in relation to the performance of shared memory coherence mechanisms. We have found that there are four critical parameters for describing the sharing
. . .
Figure 1. The distributed shared memory multiprocessor.
LMP LMP LMP
Visualising Sharing Behaviour in relation to Shared Memory Management
Mats Brorsson and Per Stenstr?m
Department of Computer Engineering, Lund University P.O. Box 118, S-221 00 LUND, Sweden
Proceedings of 1992 International Conference
on Parallel and Distributed Systems
Dec. 16-18, 1992, Hsinchu, Taiwan, R.O.C.
pages 528 - 536