The Loss Surface of XOR Artificial Neural Networks

Dhagash Mehta; Xiaojun Zhao; Edgar A. Bernal; David J. Wales

arXiv:1804.02411·stat.ML·May 30, 2018

The Loss Surface of XOR Artificial Neural Networks

Dhagash Mehta, Xiaojun Zhao, Edgar A. Bernal, David J. Wales

PDF

TL;DR

This paper investigates the complex loss landscapes of XOR neural networks using molecular science optimization tools, revealing how network size and regularization affect minima and saddle points, with implications for network training and compression.

Contribution

It introduces a novel analysis of neural network loss surfaces using energy landscape tools, showing how minima and saddle points evolve with network size and regularization.

Findings

01

Number of local minima and saddle points increases rapidly with network size.

02

Regularization makes the landscape more convex, reducing minima.

03

Smaller networks are embedded within larger network landscapes.

Abstract

Training an artificial neural network involves an optimization process over the landscape defined by the cost (loss) as a function of the network parameters. We explore these landscapes using optimisation tools developed for potential energy landscapes in molecular science. The number of local minima and transition states (saddle points of index one), as well as the ratio of transition states to minima, grow rapidly with the number of nodes in the network. There is also a strong dependence on the regularisation parameter, with the landscape becoming more convex (fewer minima) as the regularisation term increases. We demonstrate that in our formulation, stationary points for networks with $N_{h}$ hidden nodes, including the minimal network required to fit the XOR data, are also stationary points for networks with $N_{h} + 1$ hidden nodes when all the weights involving the additional nodes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDropout