page 1  (8 pages)
2to next section

Preliminary draft to appear, ICPP ?94

An Integrated Methodology for the Verification of Directory-based Cache

Protocols

Fong Pong, Per Stenstr?m* and Michel Dubois

Abstract -- The complexity of directory based protocols has motivated us to build a hardware emulator or testbed for the rapid prototyping of various protocols under various memory consistency models for CC-NUMA architectures. To implement and verify new protocols rapidly on the testbed, we have developed an overall methodology around a set of tools applicable to different aspects of the verification of a protocol, i.e., protocol-intrinsic errors, memory access ordering errors and protocol implementation errors. These tools include formal verification techniques, architecture simulators and hardware mechanisms implemented in the FPGAs of the testbed.

1. Introduction

This paper addresses the verification of directorybased protocols in Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures shown in Figure 1. Each processor has a private cache and a fraction of the shared memory attached to it. Coherence among caches is maintained by a hardware protocol. Each block has a home node (or home), which is the processor node where the block is allocated in memory and where its directory state is recorded. Memory requests must first access the home node, where the global state of the block is known. The home node satisfies the request when it is safe to do so.

FIGURE 1. Cache Coherent Non-Uniform Memory Access (CC-NUMA) Architecture

The complexity of designing correct and efficient directory-based protocols has led us to develop a methodology to validate them. For this purpose, we are building at the University of Southern California a hardware testbed tailored to the evaluation of these protocols in the context of CC-NUMA architectures with directories associated with memory. Therefore, the testbed is not appropriate for snooping protocols or for protocols with directories implemented as linked lists connecting block frames in different caches, such as SCI. The testbed can be seen as a hardware platform which can be pro-

Shared Memory

Caches

Processors

Interconnection

grammed to emulate faithfully and efficiently the behavior of complex protocols with complex memory latency tolerance mechanisms. The testbed is orders-of-magnitude faster than software simulations and allows us to run real-size workloads and operating system code in a reasonable amount of time. Moreover the testbed emulates all the details of the architecture and therefore is more reliable than software simulations, which often take short cuts and must abstract behavior in order to run efficiently.
Each protocol must be carefully verified before its implementation on the testbed because protocol bugs are hard to detect and identify on the machine. Since we will design a large number of complex protocols an efficient procedure must be put in place in order to minimize the number of bugs. We cannot just rely on detailed software simulations as was done in [5], where a simplified version of the operating system was executed on top of a software simulator at the speed of one machine cycle per second. Instead, we are using a hierarchy of tools, each with their own merit with respect to their ease of use and type of bug detected.

This hierarchy is motivated by the fact that different models of the protocol can be built at different abstraction levels. In general, more abstracted protocol models lead to better error coverage. However the abstraction may be ill-conceived or may overlook some critical aspects of the protocol. At the highest level of abstraction, we can see the protocol states (cache and memory states) as a finite state machine. At this level, protocol-intrinsic errors can be detected, including data inconsistencies, unspecified message reception due to omissions in the protocol specification, deadlocks and livelocks. Another aspect is the global ordering of all memory accesses and its adherence to the memory consistency model expected by the programmer. Unfortunately, we do not have good architectural abstractions yet to be able to verify ordering formally. We have to rely on the exhaustive testing of simulations and prototype implementations. Finally, implementation-related errors due to finite buffers and to oversights in the protocol abstraction also require simulation and prototyping. The different tools are:

? The Stanford Murj system, which is easy to program and helps detect most bugs associated with consistency at some level of abstraction of the hardware but whose error coverage is less than 100%,

? A tool based on a Symbolic State Model (SSM), which (in

its current version) is more complex to exploit than Murj and abstracts the hardware even more but has a 100% error coverage and yields efficient, protocol-intrinsic livelock and deadlock detection,

Department of Electrical Engineering Systems,

University of Southern California,

Los Angeles, CA90089-2562, USA

[email protected]

*Department of Computer Engineering,

Lund University

P.O. Box 118, S-221 00 LUND, Sweden

[email protected]

This work was supported by the National Science Foundation under Grants No. CCR-9115725 and MIP-9223812.