Unimodal Loading Problems
Monica Bianchini1, Stefano Fanelli2, Marco Gori1, and Marco Protasi2
1Dipartimento di Sistemi e Informatica, Universit?a degli Studi di Firenze
Via di Santa Marta, 3 - 50139 Firenze - Italy
Tel. +39 (55) 479.6265 - Fax +39 (55) 479.6363
2Dipartimento di Matematica, Universit?a di Roma Tor Vergata"
Via della Ricerca Scientifica - 00133 Roma - Italy
Tel. +39 (6) 7259.4681 - Fax +39 (6) 7259.4699
This paper deals with optimal learning and provides a unified viewpoint of most significant results in the field. The focus is on the problem of local minima in the cost function that is likely to affect more or less any learning algorithm. We give some intriguing links between optimal learning and the computational complexity of loading problems. We exhibit a computational model such that the solution of all loading problems giving rise to unimodal error functions require the same time, thus suggesting that they belong to the same computational class.
Keywords: Backpropagation, computational complexity, optimal learning, premature saturation, spurious and structural local minima, terminal attractor.
1 Learning as optimisation
Supervised learning in multilayered networks (MLNs) can be accomplished thanks to
Backpropagation (BP), which is used to minimise pattern misclassifications by means of
gradient descent for a particular nonlinear least squares fitting problem. Unfortunately,
BP is likely to be trapped in local minima and indeed many examples of local extremes
have been reported in the literature.
The presence of local minima derives essentially from two different reasons. First, they may arise because of an unsuitable joint choice of the functions which defines the network dynamics and the error function. Second, local minima may be inherently related to the structure of the problem at hand. In , these two cases have been referred to as spurious and structural local minima, respectively. Problems of sub-optimal solutions may also arise when learning with high initial weights, as a sort of premature neuron saturation arises, which is strictly related to the neuron fan-in. An interesting way of facing this problem is to use the relative cross-entropy metric " , for which the erroneous saturation of the output neurons does not lead to plateaux, but to very high values of the