page 1  (12 pages) 2  Decomposing Irregularly Sparse Matrices

for Parallel Matrix-Vector Multiplication?

?Umit V. C? ataly?urek and Cevdet Aykanat
Computer Engineering Department, Bilkent University
06533 Bilkent, Ankara, Turkey

Abstract. In this work, we show the deficiencies of the graph model for decomposing sparse matrices for parallel matrix-vector multiplication. Then, we propose two hypergraph models which avoid all deficiencies of the graph model. The proposed models reduce the decomposition problem to the well-known hypergraph partitioning problem widely encountered in circuit partitioning in VLSI. We have implemented fast Kernighan-Lin based graph and hypergraph partitioning heuristics and used the successful multilevel graph partitioning tool (Metis) for the experimental evaluation of the validity of the proposed hypergraph models. We have also developed a multilevel hypergraph partitioning heuristic for experimenting the performance of the multilevel approach on hypergraph partitioning. Experimental results on sparse matrices, selected from Harwell-Boeing collection and NETLIB suite, confirm both the validity of our proposed hypergraph models and appropriateness of the multilevel approach to hypergraph partitioning.

1 Introduction

Iterative solvers are widely used for the solution of large, sparse, linear system of equations on multicomputers. Three basic types of operations are repeatedly performed at each iteration. These are linear operations on dense vectors, inner product(s) of dense vectors, and sparse-matrix vector product of the form y = Ax , where y and x are dense vectors, and A is a matrix with the same sparsity structure as the coefficient matrix [5, 12]. All of these basic operations can be performed concurrently by distributing either the rows or the columns of the matrix A and the components of the dense vectors in the same way. These two decomposition schemes are referred here as rowwise and columnwise decomposition schemes, respectively. Note that these two decomposition schemes are one-dimensional decomposition of matrix A which is a two-dimensional data structure. Both of these two decomposition schemes induce a computational distribution such that each processor is held responsible for updating the values of those vector components assigned to itself. With this data distribution scheme, linear vector operations and inner-product operations can be easily and efficiently parallelized by an even distribution of vector components to processors [5, 12]. Linear vector operations do not necessitate communication, whereas

? This work is partially supported by the Commission of the European Communities, Directorate General for Industry under contract ITDC 204-82166, and Turkish Science and Research Council under grant EEEAG-160.