page 1  (42 pages)
2to next section

Abstractions for Constructing Dependable Distributed Systems

<_author_search_(shivakant mishra)>Shivakant Mishra1 and <_author_search_(richard d. schlichting)>Richard D. Schlichting

TR 92-19


Distributed systems, in which multiple machines are connected by a communications network, are often used to build highly dependable computing systems. However, constructing the software required to realize such dependability is a difficult task since it requires the programmer to build fault-tolerant software that can continue to function despite failures. To simplify this process, canonical structuring techniques or programming paradigms have been developed, including the object/action model, the primary/backup approach, the state machine approach, and conversations. In this paper, some of the system abstractions designed to support these paradigms are described. These abstractions, which are termed fault-tolerant services, can be categorized into two types. One type provides functionality similar to standard hardware or operating system services, but with improved semantics when failures occur; these include stable storage, atomic actions, resilient processes, and certain kinds of remote procedure call. The other type provides consistent information to all processors in a distributed system; these include common global time, grouporiented multicast, and membership services. In addition to describing the fundamental properties of these abstractions and their implementation techniques, a hierarchy highlighting common dependencies between services is presented. Finally, a number of systems that use these abstractions are overviewed, including the Advanced Automation System (AAS), Argus, Consul, Delta-4, ISIS, and MARS.

August 3, 1992

Department of Computer Science
The University of Arizona
Tucson, AZ 85721

1Current address: Dept. of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA