Asymptotic Smoothing of the Lipschitz Loss Landscape in Overparameterized One-Hidden-Layer ReLU Networks
Saveliy Baturin

TL;DR
This paper analyzes the loss landscape of overparameterized one-hidden-layer ReLU networks, showing it becomes flatter and more connected as the network width increases, with empirical evidence supporting the theoretical findings.
Contribution
It extends known results to Lipschitz losses with regularization, proving landscape connectivity and asymptotic flattening in overparameterized networks, supported by empirical measurements.
Findings
Loss landscape connectivity increases with network width.
Energy gaps between local and global minima decrease as width grows.
Empirical data shows wider networks have smaller energy barriers.
Abstract
We study the topology of the loss landscape of one-hidden-layer ReLU networks under overparameterization. On the theory side, we (i) prove that for convex -Lipschitz losses with an -regularized second layer, every pair of models at the same loss level can be connected by a continuous path within an arbitrarily small loss increase (extending a known result for the quadratic loss); (ii) obtain an asymptotic upper bound on the energy gap between local and global minima that vanishes as the width grows, implying that the landscape flattens and sublevel sets become connected in the limit. Empirically, on a synthetic Moons dataset and on the Wisconsin Breast Cancer dataset, we measure pairwise energy gaps via Dynamic String Sampling (DSS) and find that wider networks exhibit smaller gaps; in particular, a permutation test on the maximum gap yields…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complex Network Analysis Techniques · Statistical Methods and Inference
