page 1  (10 pages)
2to next section

Maximizing Performance in a Striped Disk Array

Peter M. Chen David A. Patterson

Computer Science Division, University of California, Berkeley

Abstract. Improvements in disk speeds have not kept up with improvements in processor and memory speeds. One way to correct the resulting speed mismatch is to stripe data across many disks. In this paper, we address how to stripe data to get maximum performance from the disks. Specifically, we examine how to choose the striping unit, i.e. the amount of logically contiguous data on each disk. We synthesize rules for determining the best striping unit for a given range of workloads.
We show how the choice of striping unit depends on only two parameters: 1) the number of outstanding requests in the disk system at any given time, and 2) the average positioning time ? data transfer rate of the disks. We derive an equation for the optimal striping unit as a function of these two parameters; we also show how to choose the striping unit without prior knowledge about the workload.

1. Introduction
In recent years, computer technology has advanced at an astonishing rate: processor speed, memory speed, and memory size have grown exponentially over the past few years [Bell84, Joy85, Moore75, Myers86]. However, disk speeds have improved at a far slower rate. As a result, many applications are now limited by the speed of their disks rather than the power of their CPUs [Agrawal84, Johnson84]. As improvements in processor and memory speeds continue to outstrip improvements in disk speeds, more and more applications will become I/O limited. One way to increase the data rate (bytes transferred per second) and the I/O rate (I/O requests per second) from a file system is by distributing, or striping, the file system over multiple disks. In this paper, we examine how to choose the striping unit, i.e. the amount of logically contiguous data to store on each disk. If this choice is made incorrectly, 80% or more of the potential disk throughput can be lost. Our goal is to synthesize rules for determining the optimal striping unit under a variety of loads, request sizes, and disk hardware parameters.
We show how the choice of striping unit depends on only two parameters: 1) the number of outstanding requests in the disk system at any given time, and 2) the average positioning time ? data transfer rate of the disks. We derive an equation for the optimal striping unit as a function of these two parameters; we also show how to choose the striping unit without prior knowledge about the workload.

2. Definitions
We define the striping unit as the maximum amount of logically contiguous data that is stored on a single disk (see Figure 1). A large striping unit will tend to keep a file clustered together on a few disks (possibly one); a small

striping unit tends to spread each file across many disks. Unlike [Patterson88, Chen90], we do not include any redundant data into our data striping scheme; data from each file is simply distributed round-robin over the disks. We use parallelism to describe the number of disks that service a user request for data. A higher degree of parallelism increases the transfer rate that each request sees. However, as more disks cooperate in servicing each request, fewer independent requests can be serviced simultaneously. We define the degree of concurrency of a workload as the average number of outstanding user requests in the system at one time. A small striping unit causes higher parallelism but supports less concurrency in the workload; a large striping unit causes little parallelism but supports more concurrency in the workload.

3. Previous Work
Disk striping is not a new concept?Cray Research has been striping files over multiple disks for many years to increase data rate [Johnson84]. However, with the proliferation of smaller diameter disk drives, striping over many disk drives could provide order of magnitude benefits in performance/cost, capacity/cost, power, and volume [Patterson88]. As a result, disk striping research has increased dramatically over the past few years.
Kim [Kim86] proposes a striping unit of one byte (byte-interleaving). Using queuing models, she finds that, under light loads, byte-interleaving yields higher throughput than a collection of non-cooperating disks

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

disk 3disk 2disk 1

disk 1

4

8

9

5

6

10

11

73

2

12

13

14

15

Figure 1: Definition of a Striping Unit. This figure shows the mapping of logical data to the disks for a striping unit of two sectors. The numbers in the figure are logical sectors; the circled two sectors constitute one stripe unit.