page 1  (12 pages)
2to next section

Bottom-up Induction of Functional Dependencies from

Relations

Iztok Savnik
Jo<=zef Stefan Institute,
Jamova 39, 61000 Ljubljana, Slovenija
e-mail: [email protected]

Peter A. Flach
Institute for Language Technology & Artificial Intelligence,
Tilburg University, POBox 90153, 5000 LE Tilburg, the Netherlands e-mail: [email protected]

Abstract

Data dependencies express the presence of structure in database relations, that can be utilised in the database design process. The discovery of data dependencies can be viewed as an induction process. Like in induction, we can distinguish between topdown approaches and bottom-up approaches. In top-down approach, dependencies are generated and then tested against the given relation. Since each test requires O(n2) comparisons, where n is the number of tuples in relation, this can be computationally costly. We propose an alternative approach which differs from the top-down approach in that it starts with an analysis of the tuples in the relation: a bottom-up approach.

1 Introduction

Data dependencies are among the basic tools for modelling relational databases. They are used for the representation of constraints on the possible relations that can be instances of the relational scheme. Many types of data dependencies have been introduced and studied in the last two decades [8]. Of these types, functional and multivalued dependencies are the ones that are most commonly found in real environments. Consequently, functional and multivalued dependencies are quite extensively studied and applied in the database design process.

Usually, data dependencies are invented by the designer during the database design process. In the first step of the database design, the choice of the relational schemes that represent concepts in the Universe of Discourse is usually not influenced by the known data dependencies. They are used as constraints that can guide the detailed design process of the database conceptual scheme. In particular, data dependencies are used for checking database consistency, and for eliminating redundancy by decomposing the relation into