| ![]() |
Copyright ? 1996 by Leonard G. C. HameyAppears in Proc. Seventh Australian Conf. Artificial Neural Networks,pages 179-183, 1996.
Analysis of the Error Surface of the XOR Network with Two
Hidden Nodes
Leonard G. C. Hamey
School of MPCE
Macquarie University NSW 2109 Australia
ABSTRACT
The exclusive-or learning task in a feed-forward neural network with two hidden nodes is investigated. Constraint equations have been derived which fully describe the finite stationary points of the error surface. It is shown that the stationary points occur in a single connected union of eighteen manifolds. A Taylor series expansion is applied to the network error surface and it is shown that all points within the enumerated manifolds are arbitrarily close to points of lower error. It follows that the finite stationary points of the exclusive-or task are saddle points, not relative minima. This result is surprising in view of the commonly held belief that the exclusive-or task exhibits local minima. The present result complements a recent result which proves the absence of regional local minima in the exclusive-or task.
v1
u11
q1 u21
v2
u22
q2
u12
f
Fig. 1: Feed-forward network to solve the exclusive-or task.
1. Introduction
It is well known that back-propagation1 learning can become trapped when being trained on the exclusive-or task with two hidden nodes (figure 1). However, the occurrence of trapped networks, which are commonly called local minima, has been observed to be rare (Rumelhart, Hinton and Williams, 1986) while depending upon the initial conditions and the network learning parameters (Kolen and Pollack, 1990). The present paper presents a theoretical analysis of the error surface of the exclusive-or task.
The study of the error surfaces of feed-forward neural networks is hampered by high dimensionality and the difficulty of theoretical analysis. Although some results have been forthcoming, these are for restricted cases. Analyses exist for networks without hidden nodes (Brady, Raghavan and Slawny, 1989; Sontag and Sussmann, 1989; Sontag and Sussmann, 1991), networks comprised of linear nodes (Baldi and Hornik, 1989) and networks with
1Submission category: Theory. Preferred presentation: Poster.
as many hidden nodes as training patterns (Poston, Lee, Choie and Kwon, 1991). In general, networks with less hidden nodes than training patterns appear not to be amenable to analysis. A significant exception is the exclusive-or network (figure 1).
Blum (1989) proved the existence of solutions in the exclusive-or learning task. They attempted to prove the existence of a manifold of relative local minima in the error surface, but their proof was flawed as previously shown by Hamey (1995c) and Sprinkhuizen-Kuyper and Boers (1994b). Lisboa and Perantonis (1991) characterise the stationary points of the error surface, obtaining four classes. Their classes (b){(d) occur only as points with infinite weight values but class (a) occurs for finite weight values. Hamey (1995a) proves that the exclusive-or task does not have any regional local minima. Other analysis of the exclusive-or network and related learning tasks may be found in (Gori and Tesi, 1992; Gori and Tesi, 1990; SprinkhuizenKuyper and Boers, 1994a; Sprinkhuizen-Kuyper and Boers, 1994c).
The present paper extends the results of Hamey
(1995a) by considering finite relative local minima.
A point w0 is said to be a relative minimum of
a function f(w) if there exists ffl > such that
f(w0+?w) <= f(w0) for all j?wj < ffl. As discussed
in Hamey (1995a), this definition is unsuitable for
the consideration of minima that occur with infinite
weights, hence the adoption of an alternate definition
of local minimum in that paper. In the present
paper, we examine in detail the manifolds of weight
space that satisfy class (a) of Lisboa and Perantonis