| ![]() |
Pruning and Grouping Discovered Association Rules
H Toivonen M Klemettinen P Ronkainen K H?at?onen H Mannila
Department of Computer Science, P.O. Box 26
FIN-00014 University of Helsinki, Finland
email: [email protected]
Abstract
Association rules are statements of the form "for 90 % of the rows of the relation,
if the row has value 1 in the columns in set X, then it has 1 also in the columns in
set Y ". Efficient methods exist for discovering association rules from large collections of
data. The number of discovered rules can, however, be so large that the rules cannot
be presented to the user. We show how the set of rules can be pruned by forming rule
covers. A rule cover is a subset of the original set of rules such that for each row in the
relation there is an applicable rule in the cover if and only if there is an applicable rule in
the original set. We also discuss grouping of association rules by clustering, and present
some experimental results of both pruning and grouping.
Keywords: data mining, association rules, covers, clustering.
1 Introduction
Association rules are an interesting class of database regularities, introduced by Agrawal, Imielinski, and Swami [AIS93]. An association rule is an expression X ) Y , where X and Y are sets of attributes. The intuitive meaning of such a rule is that in the rows of the database where the attributes in X have value true, also the attributes in Y tend to have value true. Efficient methods exist for the discovery of association rules [MTV94, AS94].
Paradoxically, data mining itself can produce such great amounts of data that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. For instance, we discovered 2000 rules from a course enrollment database, and almost 30000 rules from a telecommunications network alarm database. In this paper we discuss pruning and grouping of association rules, in order to improve the understandability of the collection of discovered association rules.
The rest of this paper is organized as follows. Association rules and there properties are discussed in Section 2. In Section 3 we present how rule sets can be pruned by forming rule covers. Grouping of association rules is discussed in Section 4, while Section 5 is a conclusion.
2 Association rules and their properties
Let R = fI1; I2; : : : ; Img be a set of attributes, also called items, over the binary domain f0; 1g. The input r = ft1; : : : ; tng for the data mining method is a relation over the relation schema fI1; I2; : : : ; Img, i.e., a set of binary vectors of size m. Each row can be considered as a set of properties or items (that is, t[i] = 1 , Ii 2 t).
Let X ? R be a set of attributes and t 2 r a row of the relation. If t[A] = 1 for all A 2 X , we write t[X ] = ?1. The set of rows matched by X is m(X) = ft 2 r j t[X ] = ?1g. An association rule over r is an expression X ) Y , where X ? R and Y ? R n X . Given real