page 1  (8 pages)
2to next section

Case Study: How Modeling Revealed Serious Performance Problems in

Distributed (DCE) Systems

A. M. Khandker

[email protected]

T. J. Teorey

[email protected]

1 Introduction

Open Software Foundation's Distributed Computing Environment (OSF/DCE) [8] is a platform for distributed computing. DCE is a collection of tools and services for the development, use, and maintenance of transparent distributed application systems. The communication paradigm supported by DCE is the synchronous Remote Procedure Call (RPC) [1].

RPCs can be implemented on any transport layer protocol, such as TCP or UDP. RPCs over UDP can be optimized more than those over TCP. Therefore, RPCs over UDP are, in general, faster and hence of our interest in this paper.

Fundamental to the overall performance of DCE is the RPC round trip time, also known as latency or response time. Round trip time is the time elapsed between when an RPC is invoked and when it is returned. In this paper, we focus on the round trip time of DCE RPC.

Our earlier work describes analytic performance modeling techniques for distributed application systems [4]. Unfortunately, the techniques couldn't be validated because the model-predicted and measured round trip times didn't match. When the model predicted a decrease in the RPC round trip time, the measured round trip time showed an increase. The prediction of the model followed intuition but the actual measurement was counterintuitive. We concluded that a performance bug in the system was causing the round trip time anomaly and investigated the reason. The result of the investigation is described this paper.

The objective of this paper is to illustrate how modeling a distributed system can reveal serious performance problems and lead to performance improvement.

We start with a background of DCE RPC in Section 2. We develop a queueing network model for RPC in Section 3 and suggest a simple extension to the Mean Value Analysis (MVA) algorithm [6] to account for parallelism present in inter-machine RPCs. We rediscover the anomaly by comparing the modelpredicted round trip times with the measured round trip times in Section 4. Section 4.3 describes the anomaly. We discuss the cause behind the anomaly in Section 4.4 and suggest the fix in 4.5. Section 5 describes our conclusions and future work.

2 Background on DCE


In a typical DCE configuration, potential servers export descriptions of the service they provide into the cell directory service (CDS) via the name service interface (NSI). Before making an RPC, a client obtains a description of services (e.g., by importing from the CDS) and chooses a compatible server. This process is known as the binding process. The end product of the binding process is a binding handle, which is a reference to binding information stored in the RPC runtime.1 The client uses

1RPC runtime is a layer of software on top of the transport protocol that provides general support for RPC operations.