A Knowledge?Based Performance Tuning Tool for Parallel Programs
Kei?Chun Li Kang Zhang
Department of Computing
Sydney, NSW 2109, Australia
The increasing complexity of parallel computing systems has brought about a crisis in parallel performance evaluation and tuning. Tools for performance measurement and visualization become necessary parts of programming environments for parallel computers. However, today's performance analysis systems offer little more than basic measurement and analysis facilities for the sources of poor performance, such as load imbalance, communication overhead, and synchronization loss. Our experience in parallel programming shows that a system which can provide higher level performance measurement and analysis is more helpful in the performance tuning of parallel program. For example, whether the programmer adopts a proper program strategy or algorithm is one of the most important factors which affect the performance of parallel programs. Therefore, we argue that a helpful performance tuning tool should be able to assist programmers to optimise the strategy or algorithm in their parallel programs. In this paper we introduce an intelligent performance tuning tool which detects and analyses the strategy and algorithm concepts in parallel programs, helps users rapidly identify the location and cause of the performance problems, and provides suggestions to improve the performance of their parallel programs.
The design of performance tuning tools for parallel programming has historically been focused on analysing the data collected in the execution of a parallel program. A number of tools, such as Paragraph , AIMS , and JEWEL , have been developed, which collect and report the performance data. They rely on the user to examine the collected data or visualised performance and compare them with expected values to identify performance problems. This type of tools face a challenge: identifying a performance bottleneck necessitates collecting detailed performance data,
yet collecting all these data can overload the trace files. At the same time, users are overwhelmed with volumes of complex graphs and tables that require a certain degree of expertise and experience to interpret and understand. In some cases, even though the tools provide information such as the sources of poor performance or the portion in the program that takes longest time to execute, the programmer still finds it difficult to improve the performance. Therefore, we believe that these tools only provide part of the solution and a system which can provide higher level performance measurement and analysis is highly desirable.
This paper presents an intelligent performance tuning tool, called PPA (Parallel Program Analyser), which uses the program analysis techniques to detect the strategy and algorithm concepts in a parallel program and suggests a better strategy, if there is one, to the programmer. Since PPA derives algorithm concepts from the text of a program by statically examining its source code without using any specification or execution information, the problem of overload of trace data for large scale parallel programs does not exist.
The remainder of this paper is divided into six sections. The next section discusses the related work. Section 3 provides the overview of the PPA system. Section 4 describes a human model of program understanding. Section 5 discusses the programming concepts and events. An example parallel program will be used to demonstrate how the programming concepts are represented in terms of events. In Section 5 we present the knowledge representation and reasoning method used in PPA. The performance tuning advice provided by PPA will be shown. Section 7 concludes the paper.
2. Related Work
To provide a better guidance to the programmer, several tools have been developed that treat the problem of finding a performance bottleneck as a search problem. Such systems attempt to identify the problem and give advice on how to relieve the problem.