Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a writeinvalidate protocol result in invalidation actions that could be eliminated. For this type of sharing, called migratory sharing, each processor typically causes a cache miss followed by an invalidation request which could be merged with the preceding cache-miss request.
In this paper we propose an adaptive protocol that invokes this optimization dynamically for migratory blocks. For other blocks, the protocol works as an ordinary write-invalidate protocol. We show that the protocol is a simple extension to a write-invalidate protocol.
Based on a program-driven simulation model of an architecture similar to the Stanford DASH, and a set of four benchmarks, we evaluate the potential performance improvements of the protocol. We find that it effectively eliminates most single invalidations which improves the performance by reducing the shared access penalty and the network traffic.
In order for shared-memory multiprocessors to achieve a high performance, memory system latency and contention must be kept as low as possible. A viable solution to this problem has been to attach private caches to each processing node and maintain cache coherence using a directorybased write-invalidate protocol. Notable examples of real implementations of large-scale multiprocessors that exploit this technique are the Stanford DASH , the MIT Alewife , the SICS Data Diffusion Machine (the DDM) , and the Kendall Square Research?s KSR1 .
Write-invalidate protocols maintain cache coherence by invalidating copies of a memory block when the block is modified by a processor. The advantage of this is that at most the first write, in a sequence of writes to the same block with no intervening read operations from other processors, causes global interaction. Consequently, writeinvalidate protocols perform fairly well for a broad range of sharing patterns. However, there exist common sharing patterns for which all invalidations could have been entirely avoided. A notable example is the invalidation overhead associated with data structures that are accessed
within critical sections. Typically, processors read and modify such data structures one at a time. Processors that access data this way cause a cache miss followed by an invalidation request being sent to the cache attached to the processor that most recently exited the critical section. If the cache coherence protocol were aware of this sharing pattern, it would be possible to merge the invalidation request with the preceding read-miss request and thus eliminate all explicit invalidation actions. This sharing behavior, denoted migratory sharing, has been previously shown to be the major source of single invalidations by Gupta and Weber in .
Eliminating invalidation requests can help performance in many important ways. First, if access requests cannot overlap invalidation requests due to memory consistency model or implementation constraints , the access penalty is reduced by reducing the number of global invalidation requests. Second, the network traffic is reduced which, as a secondary effect, may reduce the read and write penalty due to less network contention. Consequently, eliminating the number of invalidation requests may improve the performance significantly.
In this paper, we propose an implementation of an adaptive write-invalidate protocol that effectively eliminates most invalidation requests associated with migratory sharing. The protocol dynamically detects whether a memory block exhibits migratory sharing or not. For blocks deemed migratory, the invalidation request is merged with the preceding read-miss request and for other blocks, it maintains coherence according to the default write-invalidate policy. In addition, the protocol can dynamically, on a per block basis, switch between these operating modes, would the block change sharing behavior. As a case-study, we show that our protocol is a simple extension of a writeinvalidate protocol by presenting the modifications needed for a state-of-the-art write-invalidate protocol, in essence the directory-based protocol of the Stanford DASH .
To validate the correctness of the protocol and evaluate its performance, we have implemented and evaluated it using a detailed program-driven simulation model of a DASH-like architecture and a set of four benchmarks, of which three are taken from the SPLASH suite . We have found that by eliminating the invalidation requests to
An Adaptive Cache Coherence Protocol
Optimized for Migratory Sharing
Per Stenstr?m, Mats Brorsson, and Lars Sandberg
Department of Computer Engineering, Lund University
P.O. Box 118, S-221 00 LUND, Sweden
0884-7495/93 $3.00 ? 1993 IEEE
In Proceedings of the 20th Annual International Symposium on Computer Architecture San Diego, California, May 1993