Optimal Convergence of On-Line Backpropagation
M. Gori and M. Maggini
Dipartimento di Sistemi e Informatica
Universit?a di Firenze
Via di S. Marta, 3
50139 Firenze (Italia)
tel: +39 55 479.6265, fax: +39 55 479.6363
e-mail: fmarco,[email protected]
Many researchers are quite skeptical about the actual behavior of neural network learning algorithms like Backpropagation. One of the major problems is with the lack of clear theoretical results on optimal convergence, particularly for pattern mode algorithms.
In this paper, we prove the companion of Rosenblatt's PC (Perceptron Convergence) theorem for feedforward networks, stating that pattern mode Backpropagation converges to an optimal solution for linearly-separable patterns.
A remarkable aspect of the current debate on neural networks concerns the theoretical foundations of commonly used learning algorithms.
As far as optimal convergence is concerned, Minsky and Papert, in their extended edition of Perceptron , give an intriguing epilogue on PDP's novel issues. They point out that what the PDP group call a powerful new learning result" is othing more than a straightforward hill-climbing algorithm". Minsky and Papert's issues call for the need to give optimal learning a clear theoretical foundation.
Recently, some efforts have been made to understand the behavior of batch mode Backpropation by the analysis of the shape of the error surfaces. In particular, the emphasis has been placed on the problem of local minima and on conditions that guarantee their absence [2, 3, 4, 5, 6, 7].
To the best of our knowledge, however, no attempt has been made to investigate the optimal convergence
of on-line Backpropagation 1, that is used successfully in many practical applications. As earlier
pointed out in , the on-line updating departs to some extent from the true gradient descent, particularly
when the learning rate is not small enough with respect to the number of training patterns. On the
other hand, the on-line updating is not just an approximation of gradient descent and, most of the time,
has a significantly different behavior.
1Unlike batch mode, that is based on exact gradient computation, in on-line Backpropagation the weights are updated after having presented each pattern.