The Role of Symmetry in Optimizing Overparameterized Networks

Kusha Sareen; Mohammad Pedramfar; S\'ekou-Oumar Kaba; Mehran Shakerinava; Siamak Ravanbakhsh

arXiv:2604.25150·cs.LG·May 11, 2026

The Role of Symmetry in Optimizing Overparameterized Networks

Kusha Sareen, Mohammad Pedramfar, S\'ekou-Oumar Kaba, Mehran Shakerinava, Siamak Ravanbakhsh

PDF

TL;DR

This paper investigates how overparameterization and symmetry in neural networks influence optimization, revealing geometric effects that lead to better minima and faster convergence.

Contribution

It provides a theoretical framework linking symmetries, loss landscape geometry, and optimization benefits in overparameterized neural networks.

Findings

01

Symmetries act as diagonal preconditioning on the Hessian.

02

Overparameterization increases the probability of reaching global minima.

03

Wider networks exhibit lower top eigenvalues and faster convergence.

Abstract

Overparameterization is central to the success of deep learning, yet the mechanisms by which it improves optimization remain incompletely understood. We analyze weight-space symmetries in neural networks and show that overparameterization introduces additional symmetries that benefit optimization in two distinct ways. First, we prove that these symmetries act as a form of diagonal preconditioning on the Hessian, enabling the existence of better-conditioned minima within each equivalence class of functionally identical solutions. Second, we show that overparameterization increases the probability mass of global minima near typical initializations, making these favourable solutions more reachable. These results offer a potential link between loss landscape geometry and simplicity bias. Empirically, we observe wider networks have lower top eigenvalues, smaller condition numbers and faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.