| ![]() |
On the Problem of Local Minima in Recurrent Neural Networks ?
M. Bianchini, M. Gori, Member, IEEE, and M. Maggini
Dipartimento di Sistemi e Informatica, Universit?a di Firenze
Via di Santa Marta 3 - 50139 Firenze - Italy
Tel. +39 (55) 479.6265 - Fax +39 (55) 479.6363
e-mail : [email protected]
Abstract
Many researchers have recently focused their efforts on devising efficient algorithms, mainly based on optimization schemes, for learning the weights of recurrent neural networks. Like for feedforward networks however, these learning algorithms may get stuck in local minima during gradient descent, thus discovering sub-optimal solutions. This paper analyzes the problem of optimal learning in recurrent networks by proposing some sufficient conditions which guarantee local minima free error surfaces. An example is given which also show the constructive role of the proposed theory in designing networks suitable for solving a given task. Moreover, a formal relationship between recurrent and static feedforward networks is established such that the examples of local minima for feedforward networks already known in the literature can be associated with analogous ones in recurrent networks.
Index Terms- Backpropagation through time, feedforward networks, learning environment, linearly separable patterns, local minima, recurrent networks, time unfolding.
I Introduction
In the last few years many researchers have focused their efforts on recurrent neural networks
because of their attracting capability to exhibit dynamic behavior. This is quite a general
research topic since these networks, depending on their architecture and weights, can be trained
to behave as oscillators (see e.g.: [23]), as associative memories (e.g.: [1, 24]), and also as finite
automata (e.g.: [8, 10, 16, 31, 32]). This wide range of behavior is, at the same time, both the
strength and the weakness of recurrent networks. They represent a very powerful computational
model but, designing proper architectures for a given problem and devising effective learning
procedures is a very challenging task.
?This research was partially supported by MURST 40%.