Asymptotic Smoothing of the Lipschitz Loss Landscape in Overparameterized One-Hidden-Layer ReLU Networks

Saveliy Baturin

arXiv:2602.17596·cs.LG·February 20, 2026

Asymptotic Smoothing of the Lipschitz Loss Landscape in Overparameterized One-Hidden-Layer ReLU Networks

Saveliy Baturin

PDF

Open Access

TL;DR

This paper analyzes the loss landscape of overparameterized one-hidden-layer ReLU networks, showing it becomes flatter and more connected as the network width increases, with empirical evidence supporting the theoretical findings.

Contribution

It extends known results to Lipschitz losses with regularization, proving landscape connectivity and asymptotic flattening in overparameterized networks, supported by empirical measurements.

Findings

01

Loss landscape connectivity increases with network width.

02

Energy gaps between local and global minima decrease as width grows.

03

Empirical data shows wider networks have smaller energy barriers.

Abstract

We study the topology of the loss landscape of one-hidden-layer ReLU networks under overparameterization. On the theory side, we (i) prove that for convex $L$ -Lipschitz losses with an $ℓ_{1}$ -regularized second layer, every pair of models at the same loss level can be connected by a continuous path within an arbitrarily small loss increase $ϵ$ (extending a known result for the quadratic loss); (ii) obtain an asymptotic upper bound on the energy gap $ϵ$ between local and global minima that vanishes as the width $m$ grows, implying that the landscape flattens and sublevel sets become connected in the limit. Empirically, on a synthetic Moons dataset and on the Wisconsin Breast Cancer dataset, we measure pairwise energy gaps via Dynamic String Sampling (DSS) and find that wider networks exhibit smaller gaps; in particular, a permutation test on the maximum gap yields…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Complex Network Analysis Techniques · Statistical Methods and Inference