
On the Efficient Classification of Data Structures
by Neural Networks
Paolo Frasconi
Dipartimento di Sistemi
e Informatica
Universit?a di Firenze
Via di Santa Marta 3
50139 Firenze, Italy
Marco Gori
Dipartimento di Ingegneria
dell'Informazione
Universit?a di Siena
Via Roma 56
53100 Siena, Italy
Alessandro Sperduti
Dipartimento di Informatica
Universit?a di Pisa
Corso Italia, 40
56125 Pisa, Italy
Abstract
In the last few years it has been shown that recurrent neural networks are adequate for processing general data structures like trees and graphs, which opens the doors to a number of new interesting applications previously unexplored. In this paper, we analyze the efficiency of learning the membership of DOAGs (Directed Ordered Acyclic Graphs) in terms of local minima of the error surface by relying on the principle that their absence is a guarantee of efficient learning. We give sufficient conditions under which the error surface is local minima free. Specifically, we define a topological index associated with a collection of DOAGs that makes it possible to design the architecture so as to avoid local minima.
1 Introduction
It is wellknown that connectionist models are not only capable of dealing with static patterns, but also with sequential inputs. Real world, however, often proposes structured domains that can hardly be represented by simple sequences. For instance, there are cases in which the information can be framed naturally in graphs of variable size, and one may be interested in processing these structures as a whole, and not pay attention specifically to their nodes. The ability to classify these structured data is fundamental in a number of different applications such as medical and technical diagnoses, molecular biology and chemistry, automated reasoning, software engineering, geometrical and spatial reasoning, and pattern recognition. Neural networks for processing data structures have been proposed by Pollack [1990] and, recently, by Sperduti, Starita & G?oller [1995], and by Sperduti & Starita [1997]. It has been shown that they can actually be used for classifying data structures by using an algorithm, referred to
as BPTS (backpropagation through structure), that extends naturally the time unfolding carried out by BPTT, in the case of sequences. It has also been pointed out that BPTS is significantly better suited for dealing with longterm dependencies than BPTT, because of its inherent unfolding through structures instead of simple lists, the data structure counterpart of sequences. As for any neural network learning algorithm, however, the efficiency of BPTS, may be seriously plagued by the presence of local minima in the associated error function. In the epilogue of the expanded edition of Perceptron, Minsky [1988] pointed out that
... as the field of connectionism becomes more mature, the quest for a general solution to all learning problems will evolve into an understanding of which types of learning processes are likely to work on which classes of learning problems. And this means that, past a certain point, we won't be able to get by with vacuous generalities about hillclimbing. We will really need to know a great deal more about the nature of those surfaces for each specific realm of problems that we want to solve."
In this paper, we analyze the efficiency of learning the membership of DOAGs (Directed Ordered Acyclic Graphs) in terms of local minima of the error surface by relying on the principle that their absence is a guarantee of efficient learning. We give a sufficient condition under which the error surface is local minima free. In particular, we define a topological index associated with a collection of DOAGs that make it possible to design the architecture so as to avoid local minima. The condition we give holds for any training set composed of graphs with symbolic nodes and a neural network capable of learning the assigned data.
2 Recurrent networks for processing of
data structures
In this section, we review briefly the basic idea proposed in [Sperduti and Starita, 1997] concerning adaptive