page 1  (39 pages)
2to next section

Coding Techniques for Handling Failures in Large Disk Arrays1

Lisa Hellerstein,2 Garth A. Gibson,3 Richard M. Karp,4
Randy H. Katz4 and David A. Patterson4

Abstract:Abstract: The goal of the Berkeley RAID (Redundant Arrays of Inexpensive Disks) project is to design and build a high performance, reliable I/O system consisting of many inexpensive disks which can be accessed in parallel. One concern in building such a system is to protect against data loss. Although today single disks are highly reliable, when a system consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper, we present some erasure-correcting linear codes designed to protect against data loss in systems using large numbers of disks. We discuss the issues involved in designing such codes, and show that our codes are optimal with respect to important reliability and performance constraints.

Keywords: Input/Output architecture, redundant disk arrays, error-correcting codes, reliability, availability.

1. Background

In recent years, processing power has increased dramatically through advanced VLSI technology [Myers86, Gelsinger89] and parallel architectures[Bell85, Bell89]. As processing power increases, so does the demand for increased Input/Output (I/O) performance. The mainstay of on-line secondary storage, the magnetic disk, is providing neither the data rates required for applications that process large amounts of sequential data nor the access rates required for applications that process large numbers of random accesses [Boral83]. This widening gap has led to I/O systems that achieve performance through disk parallelism, using such techniques as disk striping [Chen90, Kim87, Klietz88, Livny87, Salem86].

hhhhhhhhhhhhhhhhhhhhhhhhhhhhh
1 This paper is a revised and expanded version of material that appeared in the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), Boston March 1989 [Gibson89a].
2 Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208.
3 School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213-3890. 4 Computer Science Division, Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720.