page 1  (8 pages)
2to next section

In Defense of C4.5: Notes on Learning One-Level Decision Trees

Tapio Elomaa
Department of Computer Science
P. O. Box 26 (Teollisuuskatu 23)
FIN-00014 University of Helsinki, Finland
[email protected]

Abstract

To appear in W. Cohen & H. Hirsh (eds.), Machine Learning: Proceedings of the Eleventh International Conference. (New Brunswick NJ, July 1994.) Morgan Kaufmann, San Francisco CA.

We discuss the implications of Holte's recentlypublished article, which demonstrated that on the most commonly used data very simple classification rules are almost as accurate as decision trees produced by Quinlan's C4.5. We consider, in particular, what is the significance of Holte's results for the future of top-down induction of decision trees. To an extent, Holte questioned the sense of further research on multilevel decision tree learning. We go in detail through all the parts of Holte's study. We try to put the results into perspective. We argue that the (in absolute terms) small difference in accuracy between 1R and C4.5 that was witnessed by Holte is still significant. We claim that C4.5 possesses additional accuracy-related advantages over 1R. In addition we discuss the representativeness of the databases used by Holte. We compare empirically the optimal accuracies of multilevel and one-level decision trees and observe some significant differences. We point out several deficiencies of limited-complexity classifiers.

1 INTRODUCTION

In his recent article Holte (1993) examined empirically the classification accuracy of one-level decision trees, i.e., concept classifiers that decide an instance's class on the basis of the value of a single attribute. He compared their accuracy to that of the state-of-the-art decision tree learner C4.5 (Quinlan, 1993). The test domains included a large portion of domains which have, over the years, been used as standard test cases for new inductive learning programs. The domains can nowadays be found at the University of California at Irvine repository of machine learning data sets (Murphy and Aha, 1992).

Holte's experiments revealed that the average classification accuracy achieved by C4.5 over 16 commonly used data sets is not much higher than that achieved by his learning algorithm 1R, which produces one-level decision trees.

Holte, on the one hand, discusses the implications of his results quite sparingly and, on the other hand, makes quite far reaching conclusions of the significance of his results. The hastiest people have already condemned further research on top-down induction of decision trees (TDIDT) to be futile. We do not believe that to be the case, not at least on the basis of Holte's study. These notes present our grounds.

In this paper we argue that the prediction accuracy differences between 1R and C4.5 witnessed by Holte are more significant than he lets us believe. Furthermore, C4.5 possesses the additional advantage of being stable in classification accuracy, while 1R lacks that facility (Section 2). We also consider, in Section 3, how representative of current and future application domains of inductive learning programs are those data sets that have been collected to the UCI repository. We claim that ?naturality? of a data set is often misused criterion. In Section 4, we pay a brief visit to classification accuracy upper bound considerations, as introduced by Holte. We carry out an empirical comparison of the TDIDT and the one-level decision tree learning approaches' optimal classification accuracies. Section 5, then, concentrates on concept representation issues. In particular we argue that there are inherent downsides to using extremely simple classifiers and that the ?simplicity first? methodology promoted by Holte does not constitute a good basis for knowledge acquisition. In Section 6 we discuss related work, empirical and theoretical. Finally, Section 7 presents the main conclusions of the current discussion.

2 PREDICTION ACCURACY

The fact that quite simple classification rules perform well on many of the UCI repository data sets really should not come as a great surprise to anyone who has spent some time experimenting with these domains. It is in fact curious that no one (before Holte) has seen it fit to use the accuracy of 1- level decision trees as the benchmark accuracy measure of their empirical experiments instead of the 0-level decision tree (one leaf = the most common class value) accuracy that is often announced. Surely, it has been known that the latter in many cases is not a good reference point. In more recent experiments, one can consider that C4.5's ancestor ID3