
Computational Capabilities of LocalFeedback Recurrent
Networks Acting as FiniteState Machines
Paolo Frasconi
Marco Gori
Dipartimento di Sistemi e Informatica
Universit?a di Firenze (Italy)
August 8, 1996
Abstract
In this paper we explore the expressive power of recurrent networks with local feedback
connections for symbolic data streams. We rely on the analysis of the maximal set of strings
that can be shattered by the concept class associated to these networks (i.e., strings that
can be arbitrarily classified as positive or negative), and find that their expressive power
is inherently limited, since there are sets of strings that cannot be shattered, regardless of
the number of hidden units. Although the analysis holds for networks with hardthreshold
units, we claim that the incremental computational capabilities gained when using sigmoidal
units are severely paid in terms of robustness of the corresponding representation.
1 Introduction
The computational capabilities of artificial neural networks have recently received the attention of many researchers, particularly for the case of multilayer perceptrons (see e.g. [1]). For the case of recurrent neural networks (RNNs), many efforts have focused on symbolic tasks in which the input is restricted to a finite alphabet. A noticeable result is the one by Siegelman & Sontag [2], stating that such networks are computationally equivalent to Turing machines. Many researchers, however, have raised the question whether recurrent networks with a constrained architecture are still endowed with universal computational capabilities. The issue is motivated by at least two factors. First, fully recurrent networks are likely to be more difficult to train than other recurrent networks with a constrained architecture [3]. Second, networks with a simpler architecture may yield better generalization to new examples, according to some generic parsimony principle. Some results on the computational capabilities of recurrent networks with constrained architecture have also been established. Siegelman et al. [4], for example, have shown that recurrent nonlinear autoregressive (NARX) networks (in which feedback is allowed from the output to the input, but not among the hidden units) are also Turing equivalent, except for a linear slowdown.
In this paper we consider a class of networks in which the recurrent connections are restricted to selfloops in a subset of the computational units. This is the simplest form of recurrence that can be considered. Networks having such topological constraint are referred to as local feedback recurrent networks (LFRNs) [5] or globally feedforward locally recurrent networks [6]. One advantage of these networks is that they can be trained using a special version of the error backpropagation algorithm that is local in both space and time (see, e.g., [7, 8, 9]), thus allowing online training without incurring in the heavy computational burden of other local algorithms such as realtime recurrent learning (see, e.g., [10]).
Horne & Giles [3] have experimentally found that LFRNs are unexpectedly adequate for inferring finite memory grammars [11]. However, to the best of our knowledge no attempt has been