| ![]() |
Maximizing Performance in a Striped Disk Array
Peter M. Chen David A. Patterson
Computer Science Division, University of California, Berkeley
Abstract. Improvements in disk speeds have not kept up
with improvements in processor and memory speeds. One
way to correct the resulting speed mismatch is to stripe
data across many disks. In this paper, we address how to
stripe data to get maximum performance from the disks.
Specifically, we examine how to choose the striping unit,
i.e. the amount of logically contiguous data on each disk.
We synthesize rules for determining the best striping unit
for a given range of workloads.
We show how the choice of striping unit depends on
only two parameters: 1) the number of outstanding requests
in the disk system at any given time, and 2) the average
positioning time ? data transfer rate of the disks. We
derive an equation for the optimal striping unit as a function
of these two parameters; we also show how to choose
the striping unit without prior knowledge about the workload.
1. Introduction
In recent years, computer technology has advanced
at an astonishing rate: processor speed, memory speed, and
memory size have grown exponentially over the past few
years [Bell84, Joy85, Moore75, Myers86]. However, disk
speeds have improved at a far slower rate. As a result,
many applications are now limited by the speed of their
disks rather than the power of their CPUs [Agrawal84,
Johnson84]. As improvements in processor and memory
speeds continue to outstrip improvements in disk speeds,
more and more applications will become I/O limited.
One way to increase the data rate (bytes transferred
per second) and the I/O rate (I/O requests per second) from
a file system is by distributing, or striping, the file system
over multiple disks. In this paper, we examine how to
choose the striping unit, i.e. the amount of logically contiguous
data to store on each disk. If this choice is made
incorrectly, 80% or more of the potential disk throughput
can be lost. Our goal is to synthesize rules for determining
the optimal striping unit under a variety of loads, request
sizes, and disk hardware parameters.
We show how the choice of striping unit depends on
only two parameters: 1) the number of outstanding requests
in the disk system at any given time, and 2) the average
positioning time ? data transfer rate of the disks. We
derive an equation for the optimal striping unit as a function
of these two parameters; we also show how to choose
the striping unit without prior knowledge about the workload.
2. Definitions
We define the striping unit as the maximum amount
of logically contiguous data that is stored on a single disk
(see Figure 1). A large striping unit will tend to keep a file
clustered together on a few disks (possibly one); a small
striping unit tends to spread each file across many disks. Unlike [Patterson88, Chen90], we do not include any redundant data into our data striping scheme; data from each file is simply distributed round-robin over the disks. We use parallelism to describe the number of disks that service a user request for data. A higher degree of parallelism increases the transfer rate that each request sees. However, as more disks cooperate in servicing each request, fewer independent requests can be serviced simultaneously. We define the degree of concurrency of a workload as the average number of outstanding user requests in the system at one time. A small striping unit causes higher parallelism but supports less concurrency in the workload; a large striping unit causes little parallelism but supports more concurrency in the workload.
3. Previous Work
Disk striping is not a new concept?Cray Research
has been striping files over multiple disks for many years to
increase data rate [Johnson84]. However, with the proliferation
of smaller diameter disk drives, striping over many
disk drives could provide order of magnitude benefits in
performance/cost, capacity/cost, power, and volume
[Patterson88]. As a result, disk striping research has
increased dramatically over the past few years.
Kim [Kim86] proposes a striping unit of one byte
(byte-interleaving). Using queuing models, she finds that,
under light loads, byte-interleaving yields higher
throughput than a collection of non-cooperating disks
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
disk 3disk 2disk 1
disk 1
4
8
9
5
6
10
11
73
2
12
13
14
15
Figure 1: Definition of a Striping Unit. This figure shows the mapping of logical data to the disks for a striping unit of two sectors. The numbers in the figure are logical sectors; the circled two sectors constitute one stripe unit.