| ![]() |
In D.N.L. Levy and D.F. Beal, editors, Heuristic Programming in Artificial Intelligence 2 { The Second Computer Olympiad. Ellis Horwood, 1991.
Exploratory Learning
in the Game of GO
Barney Pell1
University of Cambridge
Cambridge UK
E-mail: [email protected]
Abstract
This paper considers the importance of exploration to game-playing programs which learn by playing against opponents. The central question is whether a learning program should play the move which offers the best chance of winning the present game, or if it should play the move which has the best chance of providing useful information for future games. An approach to addressing this question is developed using probability theory, and then implemented in two different learning methods. Initial experiments in the game of Go suggest that a program which takes exploration into account can learn better against a knowledgeable opponent than a program which does not.
1 Introduction
One of the earliest aspirations of Artificial Intelligence was to develop computer game playing programs which could improve their play through experience, adapt their strategy to compete against a variety of opponents, and ultimately outplay their programmers. As in most learning problems, a program
1Parts of this work have been supported by RIACS, NASA Ames Research Center [FIA], and a British Marshall Scholarship.