page 1  (6 pages)
2to next section

Pruning and Grouping Discovered Association Rules

H Toivonen M Klemettinen P Ronkainen K H?at?onen H Mannila

Department of Computer Science, P.O. Box 26
FIN-00014 University of Helsinki, Finland
email: [email protected]

Abstract

Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set X, then it has 1 also in the columns in set Y ". Efficient methods exist for discovering association rules from large collections of data. The number of discovered rules can, however, be so large that the rules cannot be presented to the user. We show how the set of rules can be pruned by forming rule covers. A rule cover is a subset of the original set of rules such that for each row in the relation there is an applicable rule in the cover if and only if there is an applicable rule in the original set. We also discuss grouping of association rules by clustering, and present some experimental results of both pruning and grouping.
Keywords: data mining, association rules, covers, clustering.

1 Introduction

Association rules are an interesting class of database regularities, introduced by Agrawal, Imielinski, and Swami [AIS93]. An association rule is an expression X ) Y , where X and Y are sets of attributes. The intuitive meaning of such a rule is that in the rows of the database where the attributes in X have value true, also the attributes in Y tend to have value true. Efficient methods exist for the discovery of association rules [MTV94, AS94].

Paradoxically, data mining itself can produce such great amounts of data that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. For instance, we discovered 2000 rules from a course enrollment database, and almost 30000 rules from a telecommunications network alarm database. In this paper we discuss pruning and grouping of association rules, in order to improve the understandability of the collection of discovered association rules.

The rest of this paper is organized as follows. Association rules and there properties are discussed in Section 2. In Section 3 we present how rule sets can be pruned by forming rule covers. Grouping of association rules is discussed in Section 4, while Section 5 is a conclusion.

2 Association rules and their properties

Let R = fI1; I2; : : : ; Img be a set of attributes, also called items, over the binary domain f0; 1g. The input r = ft1; : : : ; tng for the data mining method is a relation over the relation schema fI1; I2; : : : ; Img, i.e., a set of binary vectors of size m. Each row can be considered as a set of properties or items (that is, t[i] = 1 , Ii 2 t).

Let X ? R be a set of attributes and t 2 r a row of the relation. If t[A] = 1 for all A 2 X , we write t[X ] = ?1. The set of rows matched by X is m(X) = ft 2 r j t[X ] = ?1g. An association rule over r is an expression X ) Y , where X ? R and Y ? R n X . Given real