The Goldilocks zone: Towards better understanding of neural network loss landscapes
Stanislav Fort, Adam Scherlis

TL;DR
This paper investigates the loss landscapes of neural networks, identifying a 'Goldilocks zone' characterized by high convexity and positive curvature, which correlates with better initialization and faster training.
Contribution
The study introduces the concept of the Goldilocks zone in neural network loss landscapes and links it to initialization quality and training efficiency.
Findings
The Goldilocks zone exhibits an excess of positive eigenvalues of the Hessian.
High convexity measures in this zone correlate with better network initialization.
Initializing networks within this zone leads to faster training on MNIST.
Abstract
We explore the loss landscape of fully-connected and convolutional neural networks using random, low-dimensional hyperplanes and hyperspheres. Evaluating the Hessian, , of the loss function on these hypersurfaces, we observe 1) an unusual excess of the number of positive eigenvalues of , and 2) a large value of at a well defined range of configuration space radii, corresponding to a thick, hollow, spherical shell we refer to as the \textit{Goldilocks zone}. We observe this effect for fully-connected neural networks over a range of network widths and depths on MNIST and CIFAR-10 datasets with the and non-linearities, and a similar effect for convolutional networks. Using our observations, we demonstrate a close connection between the Goldilocks zone, measures of local convexity/prevalence of positive curvature, and the suitability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
