
Genetic Algorithms to Learn Feature Weights for
the Nearest Neighbor Algorithm
G?ul?sen Demir?oz and H. Altay G?uvenir
Dept. of Computer Engr. and Info. Sci.
Bilkent University, 06533 Ankara, Turkey
fdemiroz, [email protected]
BUCEIS9612
Abstract
In this paper we use genetic algorithms to learn feature weights for the Nearest Neighbor classification algorithm. We represent feature weights as real values in [0..1] and their sum is 1. A new crossover operation, called continuous uniform crossover, is introduced where the legality of chromosomes is preserved after the crossover operation. This new crossover operation is compared with three other crossover operationsonepoint crossover, twopoint crossover, and uniform crossoverall of which require normalization after the crossover operation. Four genetic algorithms using each of these four crossover operations are implemented. Each genetic algorithm tries to learn feature weights to improve the classification accuracy of the Nearest Neighbor algorithm. Then the learned weights are assigned to features to be used in distance calculations of the Weighted Nearest Neighbor classification algorithm and the resulting classification accuracies in our datasets are compared.
1 Introduction
In recent years, instancebased learning algorithms (Dasarathy 1990, Aha et al. 1991, Wettschereck 1994) have been shown to work as well as other sophisticated machine learning methods despite their simplicity. One of the most common instancebased scheme is the Nearest Neighbor (NN) algorithm which classifies new instances depending on some distance measure between instances. The classical Nearest Neighbor classification algorithm treats all features as equivalent, thus performs poorly in the presence of irrelevant, weakly relevant, and/or noisy features.
To reduce the impact of irrelevant and weakly relevant features and to increase the impact of the strongly relevant features on the distance measure, several feature weighting methods have been proposed. In Wettschereck and Aha (1995), a fivedimensional framework that categorizes automated weightlearning methods is introduced. The first dimension, called feedback dimension, concerns whether the feature weighting method receives feedback from the induction algorithm trying to be improved. The methods that receive feedback are called feedback methods and the others that do not receive any feedback are