page 1  (8 pages)
2to next section

Optimal Convergence of Pattern Mode Backpropagation

M. Gori and M. Maggini ?

Dipartimento di Sistemi e Informatica

Universit?a di Firenze

Via di S. Marta, 3

50139 Firenze (Italia)

tel: +39 55 479.6265, fax: +39 55 479.6363

e-mail: [email protected]


Many researchers are quite skeptical about the actual behavior of neural network learning algorithms like Backpropagation. One of the major problems is with the lack of clear theoretical results on optimal convergence, particularly for pattern mode algorithms.

In this paper, we prove the companion of Rosenblatt's PC (Perceptron Convergence) theorem [1] for feedforward networks, stating that pattern mode Backpropagation converges to an optimal solution for linearly-separable patterns.

1 Introduction

A remarkable aspect of the current debate on neural networks concerns the theoretical foundations of commonly used learning algorithms.

As far as optimal convergence is concerned, Minsky and Papert, in their extended edition of Perceptron [2], give an intriguing epilogue on PDP's novel issues. They point out that what the PDP group call a powerful new learning result" is othing more than a straightforward hill-climbing algorithm". Minsky and Papert's issues call for the need to give optimal learning a clear theoretical foundation.

Recently, some efforts have been made to understand the behavior of batch mode Backpropation by the analysis of the shape of the error surfaces. In particular, the emphasis has been placed on the problem of local minima and on conditions that guarantee their absence [3, 4, 5, 6, 7, 8].

To the best of our knowledge, however, no attempt has been made to investigate the optimal convergence of pattern mode Backpropagation, that is used successfully in many practical applications. As earlier pointed out in [9], pattern mode departs to some extent from the true gradient descent, particularly when the learning rate is not small enough with respect to the number of training patterns. On the other hand, pattern mode is not just an approximation of gradient descent and, most of the time, has a significantly different behavior.
?He is currently a visiting graduate student with NEC Research Institute, 4 Independence Way, NJ 08540, [email protected]