page 1  (28 pages)
2to next section

Input/Output Behavior of Supercomputing Applications

Ethan L. Miller
Computer Science Division
Department of Electrical Engineering
and Computer Science
University of California, Berkeley
Berkeley, CA 94720
December 14, 1990
Revised October 16, 1992


This paper describes the collection and analysis of supercomputer I/O traces
and their use in a collection of buffering and caching simulations. This
serves two purposes. First, it gives a model of how individual applications
running on supercomputers request file system I/O, allowing system designers
to optimize I/O hardware and file system algorithms to that model. Second,
the buffering simulations show what resources are needed to maximize
the CPU utilization of a supercomputer given a very bursty I/O request rate.
By using read-ahead and write-behind in a large solid-state disk, one or two
applications were sufficient to fully utilize a Cray Y-MP CPU.

1. Introduction

Over the last few years, CPUs have seen tremendous gains in performance. I/O systems and memory systems, however, have not enjoyed the same rate of increase. As a result, supercomputer applications are generating more data, but I/O systems are becoming less able to cope with this huge volume of information. Multiprocessors are exacerbating this problem, as the number of disks and tape drives in the I/O system, and thus aggregate I/O bandwidth, increase. Bandwidth is not usually scaled up at the same rate as the aggregate processing speed, however. According to Amdahl's metric, each MIPS (million instructions per second) should be accompanied by one Mbit per second of I/O. Solving this problem requires correct matching of bandwidth capability to application bandwidth requirements, and using buffering to reduce the peak bandwidth that the I/O system must handle. To better determine the necessary hardware bandwidth and software buffer sizing and policies, the I/O patterns of applications running on such computers must first be analyzed. To