page 1  (10 pages)
2to next section

Genetic Algorithms to Learn Feature Weights for

the Nearest Neighbor Algorithm

G?ul?sen Demir?oz and H. Altay G?uvenir
Dept. of Computer Engr. and Info. Sci.
Bilkent University, 06533 Ankara, Turkey
fdemiroz, [email protected]

BU-CEIS-9612

Abstract

In this paper we use genetic algorithms to learn feature weights for the Nearest Neighbor classification algorithm. We represent feature weights as real values in [0..1] and their sum is 1. A new crossover operation, called continuous uniform crossover, is introduced where the legality of chromosomes is preserved after the crossover operation. This new crossover operation is compared with three other crossover operations|one-point crossover, two-point crossover, and uniform crossover|all of which require normalization after the crossover operation. Four genetic algorithms using each of these four crossover operations are implemented. Each genetic algorithm tries to learn feature weights to improve the classification accuracy of the Nearest Neighbor algorithm. Then the learned weights are assigned to features to be used in distance calculations of the Weighted Nearest Neighbor classification algorithm and the resulting classification accuracies in our datasets are compared.

1 Introduction

In recent years, instance-based learning algorithms (Dasarathy 1990, Aha et al. 1991, Wettschereck 1994) have been shown to work as well as other sophisticated machine learning methods despite their simplicity. One of the most common instance-based scheme is the Nearest Neighbor (NN) algorithm which classifies new instances depending on some distance measure between instances. The classical Nearest Neighbor classification algorithm treats all features as equivalent, thus performs poorly in the presence of irrelevant, weakly relevant, and/or noisy features.

To reduce the impact of irrelevant and weakly relevant features and to increase the impact of the strongly relevant features on the distance measure, several feature weighting methods have been proposed. In Wettschereck and Aha (1995), a five-dimensional framework that categorizes automated weight-learning methods is introduced. The first dimension, called feedback dimension, concerns whether the feature weighting method receives feedback from the induction algorithm trying to be improved. The methods that receive feedback are called feedback methods and the others that do not receive any feedback are