Computational Capabilities of Local-Feedback Recurrent
Networks Acting as Finite-State Machines
Dipartimento di Sistemi e Informatica
Universit?a di Firenze (Italy)
August 8, 1996
In this paper we explore the expressive power of recurrent networks with local feedback connections for symbolic data streams. We rely on the analysis of the maximal set of strings that can be shattered by the concept class associated to these networks (i.e., strings that can be arbitrarily classified as positive or negative), and find that their expressive power is inherently limited, since there are sets of strings that cannot be shattered, regardless of the number of hidden units. Although the analysis holds for networks with hard-threshold units, we claim that the incremental computational capabilities gained when using sigmoidal units are severely paid in terms of robustness of the corresponding representation.
The computational capabilities of artificial neural networks have recently received the attention of many researchers, particularly for the case of multilayer perceptrons (see e.g. ). For the case of recurrent neural networks (RNNs), many efforts have focused on symbolic tasks in which the input is restricted to a finite alphabet. A noticeable result is the one by Siegelman & Sontag , stating that such networks are computationally equivalent to Turing machines. Many researchers, however, have raised the question whether recurrent networks with a constrained architecture are still endowed with universal computational capabilities. The issue is motivated by at least two factors. First, fully recurrent networks are likely to be more difficult to train than other recurrent networks with a constrained architecture . Second, networks with a simpler architecture may yield better generalization to new examples, according to some generic parsimony principle. Some results on the computational capabilities of recurrent networks with constrained architecture have also been established. Siegelman et al. , for example, have shown that recurrent nonlinear autoregressive (NARX) networks (in which feedback is allowed from the output to the input, but not among the hidden units) are also Turing equivalent, except for a linear slowdown.
In this paper we consider a class of networks in which the recurrent connections are restricted to self-loops in a subset of the computational units. This is the simplest form of recurrence that can be considered. Networks having such topological constraint are referred to as local feedback recurrent networks (LFRNs)  or globally feedforward locally recurrent networks . One advantage of these networks is that they can be trained using a special version of the error backpropagation algorithm that is local in both space and time (see, e.g., [7, 8, 9]), thus allowing on-line training without incurring in the heavy computational burden of other local algorithms such as real-time recurrent learning (see, e.g., ).
Horne & Giles  have experimentally found that LFRNs are unexpectedly adequate for inferring finite memory grammars . However, to the best of our knowledge no attempt has been