Learning in Multilayered Networks Used as
M. Bianchini, P. Frasconi, and M. Gori, Member, IEEE
Dipartimento di Sistemi e Informatica, Universit?a di Firenze
Via di Santa Marta 3 - 50139 Firenze - Italy
Tel. +39 (55) 479.6265 - Fax +39 (55) 479.6363
e-mail : [email protected]
Gradient descent learning algorithms may get stuck in local minima, thus making the learning sub-optimal. In this paper, we focus attention on multilayered networks used as autoassociators and show some relationships with classical linear autoassociators. In addition, using the theoretical framework of our previous research , we derive a condition which is met at the end of the learning process and show that this condition has a very intriguing geometrical meaning in the pattern space.
Index Terms- Autoassociators, Backpropagation, multilayered networks.
It is quite a common opinion that in order to use multilayered networks successfully, a thorough understanding of the learning process is needed. Among the theoretical problems related to Backpropagation (BP ), that of the error surface local minima is one of the most important. It has been investigated by numerous researchers in the attempt of finding examples of local minima and conditions for guaranteeing local minima free error surfaces. A general analysis on that problem, which also contains numerous references, can be found in , where some conditions are given for guaranteeing that the error surfaces are local minima free in the case of pyramidal networks. Roughly speaking the BP optimal convergence is guaranteed for networks with many inputs and many hidden units. In , the already cited analysis is specialized for the case of linearly separable patterns, which is likely to hold for patterns represented by many coordinates". In [6, 7], the absence of local minima is guaranteed for networks with one hidden layer and as many hidden units as patterns. A similar condition leads obviously to nets with many hidden units". In this paper, we propose an analysis of the learning for multilayered networks that turns out to be particularly useful for autoassociators. We discuss the role of the hidden neuron nonlinearity by showing that, in spite of linear autoassociators, any network configuration autoassociates patterns of a limited region. Concerning the problem of local minima however, we give an example that shows the ?This research was partially supported by MURST 40%.