The Goldilocks zone: Towards better understanding of neural network loss   landscapes

Stanislav Fort; Adam Scherlis

arXiv:1807.02581·cs.LG·November 13, 2018

The Goldilocks zone: Towards better understanding of neural network loss landscapes

Stanislav Fort, Adam Scherlis

PDF

TL;DR

This paper investigates the loss landscapes of neural networks, identifying a 'Goldilocks zone' characterized by high convexity and positive curvature, which correlates with better initialization and faster training.

Contribution

The study introduces the concept of the Goldilocks zone in neural network loss landscapes and links it to initialization quality and training efficiency.

Findings

01

The Goldilocks zone exhibits an excess of positive eigenvalues of the Hessian.

02

High convexity measures in this zone correlate with better network initialization.

03

Initializing networks within this zone leads to faster training on MNIST.

Abstract

We explore the loss landscape of fully-connected and convolutional neural networks using random, low-dimensional hyperplanes and hyperspheres. Evaluating the Hessian, $H$ , of the loss function on these hypersurfaces, we observe 1) an unusual excess of the number of positive eigenvalues of $H$ , and 2) a large value of $Tr (H) /∣∣ H ∣∣$ at a well defined range of configuration space radii, corresponding to a thick, hollow, spherical shell we refer to as the \textit{Goldilocks zone}. We observe this effect for fully-connected neural networks over a range of network widths and depths on MNIST and CIFAR-10 datasets with the $ReLU$ and $tanh$ non-linearities, and a similar effect for convolutional networks. Using our observations, we demonstrate a close connection between the Goldilocks zone, measures of local convexity/prevalence of positive curvature, and the suitability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.