page 1  (10 pages)
2to next section

Using Hints to Reduce the Read Miss Penalty for Flat COMA Protocols*

M?rten Bj?rkman, Fredrik Dahlgren, and Per Stenstr?m

Department of Computer Engineering, Lund University

P.O. Box 118, S-221 00 LUND, Sweden

locating a copy can include several directory lookups. This is in contrast to in CC-NUMA machines where a single directory lookup locates a copy.

COMA machines can also locate copies using a single directory lookup as in CC-NUMA machines which is the basic idea behind the flat COMA (COMA-F) proposal by Stenstr?m et al. [17]. When a cache miss cannot be serviced by the local attraction memory, the miss request is sent to a directory which then forwards the request to an attraction memory that keeps a copy of the block. This attraction memory then returns the copy to the requesting node. While the whole transaction often includes three network traversals, two network traversals would suffice, did the requesting node know where to retrieve a copy.

To be able to send the read-miss request directly to a holder of the block, one can associate an identifier of the potential block holder?called a hint?with each attraction-memory block-frame. This concept was incorporated in a COMA-F protocol by Gupta et al. in [12]. If the hint is correct, the copy is retrieved in two network traversals but if the hint turns out to be wrong, the directory has to be interrogated. Since this costs the latency of an extra network traversal, the hints must be correct in at least fifty percent of the cases to reduce the read miss penalty.

We evaluate in this paper the performance improvement using hints on a simulated flat COMA machine and four benchmark applications. While we evaluate the protocol using hints according to [12], we find that the savings can be offset by the extra network traversals associated with unsuccessful hints. This motivates us to study a new protocol using hints that simultaneously sends a request to the potential holder as well as to the directory; clearly, unsuccessful hints do not introduce extra miss latency in this protocol. Although we find that this protocol cuts the read-miss penalty?in some case by 14%?the improvement is limited by the small fraction of misses that can use hints and the low success rate of hints.

In the next section, we review the latency associated with read misses in protocols for CC-NUMA and COMA machines and Section 3 presents a new COMA protocol that uses hints to reduce the read-miss penalty. We present our architectural simulation results in Sections 4 and 5 before we conclude in Section 6.
*This work was supported by the Swedish National Board for Technical Development (NUTEK) under the contract P855.

Abstract

In flat COMA architectures, an attraction-memory miss must first interrogate a directory before a copy of the requested data can be located which often involves three network traversals. By keeping track of the identity of a potential holder of the copy?called a hint?one network traversal can be saved which reduces the read penalty.

We have evaluated the reduction of the read miss penalty provided by hints using detailed architectural simulations and four benchmark applications. The results show that a previously proposed protocol using hints actually can make the read miss penalty larger because when the hint is not correct, an extra network traversal is needed. This has motivated us to study a new protocol using hints that simultaneously sends a request to the potential holder and to the directory. This protocol reduces the read miss penalty for all applications but the protocol complexity does not seem to justify the performance improvement.

1. Introduction

Cache-coherent NUMA (CC-NUMA) and cache-only memory architectures (COMA) are two emerging styles of building scalable shared-memory architectures. Examples of the former type include the Stanford DASH [14] and the MIT Alewife [1] whereas the Swedish Institute of Computer Science's Data Diffusion Machine (DDM) [13] and Kendall Square Research?s KSR1 [4] are examples of the latter type. Both styles use processing nodes that consist of processors, caches, and a portion of the distributed main memory. In contrast to CC-NUMA machines, main memory in COMA is converted into huge caches called attraction memories, that support replication of data not only across caches, but also across memories.

The main advantage of COMA as compared to CC- NUMA machines is that a vast majority of replacement cache-misses can be handled in the local attraction memory [17]. However, to handle cache misses that cannot be carried out locally, a mechanism is needed that locates the node in which a copy of the memory block resides. DDM and KSR1?examples of hierarchical COMAs?use a hierarchical directory structure. Therefore, the latency of

In Proc. of the 1995 Int. Conf. on System Sciences
(28th HICSS), vol. I, pp. 242-251, IEEE Jan. 1995