
SUCCESSES AND FAILURES OF BACKPROPAGATION: A THEORETICAL INVESTIGATION
P. Frasconi, M. Gori, and A. Tesi
Dipartimento di Sistemi e Informatica, Universit?a di Firenze
Via di Santa Marta 3  50139 Firenze  Italy
Tel. (39) 554796265  Telex 580681 UNFING I  Fax (39) 554796363
email : [email protected]
1 Introduction
Backpropagation is probably the most widely applied neural network learning algorithm. Backprop's popularity is related to its ability to deal with complex multidimensional mappings. In the words of Werbos [56] the algorithm goes beyond regression". Backprop's theory is related to many disciplines and has been developed by several different research groups. As pointed out by le Cun [38], to some extent, the basic elements of the theory can be traced back to the famous book of Bryson and Ho[9]. A more explicit statement of the algorithm has been proposed by Werbos [56], Parker [43], le Cun [36], and members of the PDP group [44]. Although many researchers have contributed in different ways in the development and proposition of different aspects of Backprop, there is no question that Rumelhart and the PDP group have the credit for the current high diffusion of the algorithm. As Widrow points out in [57], what is actually new with Backprop is the adoption of squashing" instead of hardlimiting" nonlinearities. That idea never occurred to neural network researchers throughout the sixties. More importantly, thanks to that idea, algorithms based on optimization of regular" functions can be used for the learning process. In addition to this nice property, which opened the doors to a different methodology with respect to that of the Perceptron learning algorithm, the