| ![]() |
EMPIRICAL EVALUATION OF THE CFP
ALGORITHM
_IZZET S? _IR_IN and H. ALTAY G?UVEN_IR
Computer Engineering and Information Science Department
Bilkent University, Ankara 06533 TURKEY
E-mail: fsirin, [email protected]
ABSTRACT
This paper presents a new methodology for concept learning from examples, called Classification by Feature Partitioning (CFP), which is an inductive, incremental and supervised learning method. Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint partitions of values. A partition is expanded through generalization or specialized by dividing it into subpartitions. An empirical evaluation of the CFP on real-world data sets is presented.
1. Introduction
This paper presents a form of exemplar-based learning, called Classification by Feature Partitioning (CFP). In exemplar-based learning examples are stored in memory verbatim. The CFP technique makes several significant improvements over other exemplar-based learning algorithms. For example, IBL1 algorithms learn a set of instances which is a representative subset of all training examples. Another algorithm called EACH5 learns a set of hyperrectangles of the examples. On the other hand, the CFP method stores the instances as factored out by their feature values. The CFP partitions each feature into segments corresponding to concepts. Therefore, the concept description learned by the CFP is a collection of feature partitions.
Each feature contributes the classification process by its local knowledge. Final classification is based on a voting among the predictions of the features. The CFP algorithm significantly reduces the classification complexity, over other exemplarbased techniques. The strength of the contribution of a feature in the voting process is determined by the weight of that feature.
Most real-world data sets contain missing attribute values. Previous learning systems usually overcome this problem by either filling in missing attribute values, or looking at the probability distribution of values of attributes. Most common approaches are compared in Quinlan (1993), leading to a general conclusion that no one approach is uniformly superior to others. In contrast, CFP solves this problem very naturally. Since CFP treats each attribute value separately, in the case of an unknown attribute value, it simply leaves the partitioning of that feature intact.
In the next section the CFP algorithm is described (precise details are given S?irin (1993)). The voting process of the CFP is illustrated through an example. Section 3 presents an empirical evaluation of the CFP algorithm on various real-world data sets. The final section discusses the applicability of the CFP algorithm.