
MDL Learning of Probabilistic Neural
Networks for Discrete Problem Domains
Proceedings of the IEEE World Congress on Computational Intelligence (Orlando, June 1994), 1493{1497.
Henry Tirri and Petri Myllym?aki
Abstract Given a problem, a casebased reasoning (CBR) system will search its case memory and use the stored cases to find the solution, possibly modifying retrieved cases to adapt to the required input specifications. In discrete domains CBR reasoning can be based on a rigorous Bayesian probability propagation algorithm. Such a Bayesian CBR system can be implemented as a probabilistic feedforward neural network with one of the layers representing the cases. In this paper we introduce a Minimum Description Length (MDL) based learning algorithm to obtain the proper network structure with the associated conditional probabilities. This algorithm together with the resulting neural network implementation provide a massively parallel architecture for solving the efficiency bottleneck in casebased reasoning.
I. Introduction
Our research is motivated by combining the theory of casebased reasoning (CBR) with neural network techniques, especially for developing massively parallel implementations for shallow knowledge" domains such as the ones discussed in [4]. In casebased reasoning paradigm the dynamic case memory is central to the reasoning process and learning is an inherent part of the process [5]. Although the idea of using a case memory is simple in principle, there are many open issues related to case indexing, similarity metric in case matching and case adaptation. For example much of the published work has concentrated on applying machine learning methods for indexing cases in order to avoid costly comparison to the case set. However, as recent work demonstrates [10], one is able to use Bayesian reasoning framework [13, 11] to develop theoretically sound solutions for the CBR reasoning process.
On the other hand neural network architectures are constructed from a large amount of highly con
The authors are with the Department of Computer Science, P.O.Box 26, FIN00014 University of Helsinki, Finland This research was supported by the Technology Development Center (TEKES)
. . . .. . . .
C
A1 A i Am
Figure 1: Bayesian network representation of the case base.
nected elements. This makes them suitable for distortion tolerant storing of a large number of cases (represented by high dimensional vectors) by making single elements sensitive" to a stored item. In addition, as there is nothing that resembles a shared memory, the neural computing architecture is inherently parallel, and each element can perform comparison of its input against the value stored in the interconnections independently from the others. This offers a linear speedup in the comparison process with respect to the number of computational elements available. This means that for example the case indexing problem [5] can be addressed directly at the architectural level as matching can be performed by using the available parallelism. Consequently for discrete domains Bayesian CBR reasoning implemented on a wellmatching neural architecture will complement each other and providing benefits of both approaches.
In [9] we presented a neural network architecture for casebased reasoning by adopting Pearl's probability propagation calculation and represented it as a multilayer feedforward probabilistic network. This approach has the advantages discussed above: efficient case matching inherent in the parallel architecture, theoretically sound Bayesian interpretation of the casespace metrics and one possible solution to case adaptation via similar probability propagation as the one used for case matching. This network is related to the probabilistic networks of Specht [19, 20], and other kernelbased estimators [12, 1], which assume that the domain probability distribution can be approximated as a finite mixture of certain type of kernel distributions (e.g., Gaussian). In
1493