
Decomposing Irregularly Sparse Matrices
for Parallel MatrixVector Multiplication?
?Umit V. C? ataly?urek and Cevdet Aykanat
Computer Engineering Department, Bilkent University
06533 Bilkent, Ankara, Turkey
Abstract. In this work, we show the deficiencies of the graph model for decomposing sparse matrices for parallel matrixvector multiplication. Then, we propose two hypergraph models which avoid all deficiencies of the graph model. The proposed models reduce the decomposition problem to the wellknown hypergraph partitioning problem widely encountered in circuit partitioning in VLSI. We have implemented fast KernighanLin based graph and hypergraph partitioning heuristics and used the successful multilevel graph partitioning tool (Metis) for the experimental evaluation of the validity of the proposed hypergraph models. We have also developed a multilevel hypergraph partitioning heuristic for experimenting the performance of the multilevel approach on hypergraph partitioning. Experimental results on sparse matrices, selected from HarwellBoeing collection and NETLIB suite, confirm both the validity of our proposed hypergraph models and appropriateness of the multilevel approach to hypergraph partitioning.
1 Introduction
Iterative solvers are widely used for the solution of large, sparse, linear system of equations on multicomputers. Three basic types of operations are repeatedly performed at each iteration. These are linear operations on dense vectors, inner product(s) of dense vectors, and sparsematrix vector product of the form y = Ax , where y and x are dense vectors, and A is a matrix with the same sparsity structure as the coefficient matrix [5, 12]. All of these basic operations can be performed concurrently by distributing either the rows or the columns of the matrix A and the components of the dense vectors in the same way. These two decomposition schemes are referred here as rowwise and columnwise decomposition schemes, respectively. Note that these two decomposition schemes are onedimensional decomposition of matrix A which is a twodimensional data structure. Both of these two decomposition schemes induce a computational distribution such that each processor is held responsible for updating the values of those vector components assigned to itself. With this data distribution scheme, linear vector operations and innerproduct operations can be easily and efficiently parallelized by an even distribution of vector components to processors [5, 12]. Linear vector operations do not necessitate communication, whereas
? This work is partially supported by the Commission of the European Communities, Directorate General for Industry under contract ITDC 20482166, and Turkish Science and Research Council under grant EEEAG160.