Appears in Cowan, J.D., Tesauro, G., and Alspector, J. (eds.).
Advances in Neural Information Processing Systems 6
Morgan Kaufmann Pub., 1994
Credit Assignment through Time:
Alternatives to Backpropagation
Yoshua Bengio ?
Dept. Informatique et
Universit?e de Montr?eal
Montreal, Qc H3C-3J7
Dip. di Sistemi e Informatica
Universit?a di Firenze
50139 Firenze (Italy)
Learning to recognize or predict sequences using long-term context has many applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior to that obtained with backpropagation.
Recurrent neural networks have been considered to learn to map input sequences to output sequences. Machines that could efficiently learn such tasks would be useful for many applications involving sequence prediction, recognition or production.
However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. In fact, we can prove that dynamical systems such as recurrent neural networks will be increasingly difficult to train with gradient descent as the duration of the dependencies to be captured increases. A mathematical analysis of the problem shows that either one of two conditions arises in such systems. In the first case, the dynamics of the network allow it to reliably store bits of information (with bounded input noise), but gradients (with respect to an error at a given time step) vanish exponentially fast as one propagates them
?also, AT&T Bell Labs, Holmdel, NJ 07733