
Learning in neural networks
with Bayesian prototypes
Proceedings of SOUTHCON'94 (Orlando, March 1994), 60{64.
Petri Myllym?aki and Henry Tirri
University of Helsinki, Department of Computer Science
P.O.Box 26, FIN00014 University of Helsinki, Finland
Abstract Given a set of samples of a probability distribution on a set of discrete random variables, we study the problem of constructing a good approximative neural network model of the underlying probability distribution. Our approach is based on an unsupervised learning scheme where the samples are first divided into separate clusters, and each cluster is then coded as a single vector. These Bayesian prototype vectors consist of conditional probabilities representing the attributevalue distribution inside the corresponding cluster. Using these prototype vectors, it is possible to model the underlying joint probability distribution as a simple Bayesian network (a tree), which can be realized as a feedforward neural network capable of probabilistic reasoning. In this framework, learning means choosing the size of the prototype set, partitioning the samples into the corresponding clusters, and constructing the cluster prototypes. We describe how the prototypes can be determined, given a partition of the samples, and present a method for evaluating the likelihood of the corresponding Bayesian tree. We also present a greedy heuristic for searching through the space of different partition schemes with different numbers of clusters, aiming at an optimal approximation of the probability distribution.
I. Introduction
This work aims at combining the ideas of memorybased reasoning (MBR) with neural network techniques, especially for developing massively parallel implementations for shallow knowledge" domains such as the ones discussed in [4]. Given a problem, a memorybased reasoning system will search its prototype memory and match the input vector against the stored prototype vectors, providing them with a similarity rank according to some similarity measure (prototype matching). The prototypes are then
This research was supported by the Technology Development Center (TEKES).
modified according to the rankings, in order to provide a solution consistent with the input vector (adaptation). As there is nothing that resembles a shared memory, the memorybased reasoning architecture is inherently parallel, and each element can perform comparison of its input against the value stored in the interconnections independently from the others (see the general MBR structure in Figure 1). This offers a linear speedup in the comparison process with respect to the number of computational elements available.
Neural network architectures are constructed from a large amount of highly connected elements, which makes them suitable for distortion tolerant storing of a large number of prototypes (represented by high dimensional vectors) by making single elements sensitive" to a stored item. As a matter of fact, many existing neural network architectures [9, 17, 18, 8, 3] can be seen as realizations of the general memorybased reasoning scheme. These massively parallel models, and other kernelbased estimators (e.g., [13, 1]) assume that the domain probability distribution can be approximated as a finite mixture of particular types of kernel distributions (e.g., Gaussian). In our approach, we restrict ourselves to discrete problem domains, which allows us to drop all assumptions about the form of the underlying probability distribution. Modeling the probability distribution by a set of Bayesian prototype vectors, allows us to use the theory of Bayesian networks [14, 12] to construct a treestructured Bayesian network representation of the probability distribution. The approach resembles the simple Bayesian model of [5] and [6], but unlike these models, our network can be used also for general regression tasks (adaptation), and not only for classification tasks (prototype matching). What is more, these tasks can be performed efficiently on a massively parallel neural network architecture, as there is a direct mapping from our Bayesian tree structure to a multilayer feedforward neural network structure.
The accuracy of our model depends on how well the chosen Bayesian prototypes reflect the actual probability distribution. In [10, 11], the prototypes
60