page 1  (12 pages)
2to next section

An Analytic Performance Model of Disk Arrays

Edward K. Lee

University of California

571 Evans Hall

Berkeley, CA 94720

[email protected]

Randy H. Katz

University of California

571 Evans Hall

Berkeley, CA 94720

[email protected]

Abstract

As disk arrays become widely used, tools for understanding and analyzing their performance become increasingly important. In particular, performance models can be invaluable in both configuring and designing disk arrays. Accurate analytic performance models are preferable to other types of models because they can be quickly evaluated, are applicable under a wide range of system and workload parameters, and can be manipulated by a range of mathematical techniques. Unfortunately, analytic performance models of disk arrays are difficult to formulate due to the presence of queueing and fork-join synchronization; a disk array request is broken up into independent disk requests which must all complete to satisfy the original request. In this paper, we develop and validate an analytic performance model for disk arrays. We derive simple equations for approximating their utilization, response time and throughput. We validate the analytic model via simulation, investigate the error introduced by each approximation used in deriving the analytic model, and examine the validity of some of the conclusions that can be drawn from the model.

1 Introduction

Disk arrays provide high I/O performance by striping data over multiple disks. High performance is achieved by servicing multiple I/O requests concurrently and by using several disks to service a single request in parallel. Given the increasing importance of disk arrays as high-performance secondary storage systems [7, 8, 12, 14, 15, 17], tools for understanding their performance become increasingly important. In particular, perfor-

mance models, combined with a thorough understanding of an installation's workload, will be invaluable in both configuring and designing disk arrays. In general, accurate analytic performance models are preferable to other types of models, such as empirical and simulation, because they can be quickly evaluated, are applicable under a wide range of system and workload parameters, and can be manipulated by a range of mathematical techniques. Even when analytic models are not directly applicable to a particular system or workload, they are frequently useful for quickly analyzing general properties of the system, stimulating intuition and furthering understanding.

Unfortunately, analytic performance models of disk arrays are difficult to formulate due to the presence of queueing and fork-join synchronization; a disk array request is broken up into independent disk requests, all of which must complete to satisfy the disk array request. While systems with only of queueing or fork-join synchronization are frequently tractable given certain reasonable approximations, the combination of the two result in systems that are very difficult to analyze analytically. Exact analytic solutions for the two server fork-join queue given Poisson arrivals and independent service times currently exist [1, 4] but the k-server forkjoin queue remains unsolved. Other related work in the field of disk array performance falls into four primary categories: (1) simulation studies [3, 12, 16], (2) analytic models that ignore queueing effects [2, 9, 17], (3) analytic models that ignore fork-join synchronization [8] and (4) restricted queueing models that deal with forkjoin synchronization using specialized techniques not easily extended to modeling disk arrays [5, 13]. Most analytic queueing studies deal with general queueing systems rather than disk arrays in particular.

In this paper, we develop and validate an analytic performance model for asynchronous disk arrays. Our model is different from previous analytic models of disk arrays mentioned above for the following reasons. First, we use a closed queueing model with a fixed number of processes whereas previous analytic models of disk