Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

Jie Huang; Bruno Loureiro; Stefano Sarao Mannelli

arXiv:2604.09412·stat.ML·April 13, 2026

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

Jie Huang, Bruno Loureiro, Stefano Sarao Mannelli

PDF

TL;DR

This paper characterizes the loss landscape of two-layer ReLU neural networks, revealing the structure of local minima, their connection to SGD dynamics, and how overparameterization influences convergence.

Contribution

It provides a sharp, interpretable description of local minima in the loss landscape and links these minima to SGD fixed points in a high-dimensional setting.

Findings

01

Local minima have low-dimensional summary statistic representations.

02

Local minima are attractive fixed points of SGD in summary statistic space.

03

Overparameterization connects minima via flat directions, easing convergence.

Abstract

We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k = 1}^{K} ReLU (w_{k}^{⊤} x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.