On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks

Arthur Castello Branco de Oliveira; Dhruv Jatkar; Eduardo Sontag

arXiv:2511.09810·cs.LG·November 14, 2025

On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks

Arthur Castello Branco de Oliveira, Dhruv Jatkar, Eduardo Sontag

PDF

Open Access

TL;DR

This paper explores how the compositional structure of neural networks influences their optimization landscape and training dynamics, revealing universal properties and potential for accelerated convergence in overparameterized models.

Contribution

It provides a theoretical analysis of the convergence properties of overparameterized neural networks with linear activations, highlighting universal landscape features and initialization effects.

Findings

01

Global convergence can be established for proper, real analytic cost functions.

02

The geometry of the landscape, including saddle points, is universal across costs.

03

Initialization can significantly accelerate convergence depending on an imbalance metric.

Abstract

This paper investigates how the compositional structure of neural networks shapes their optimization landscape and training dynamics. We analyze the gradient flow associated with overparameterized optimization problems, which can be interpreted as training a neural network with linear activations. Remarkably, we show that the global convergence properties can be derived for any cost function that is proper and real analytic. We then specialize the analysis to scalar-valued cost functions, where the geometry of the landscape can be fully characterized. In this setting, we demonstrate that key structural features -- such as the location and stability of saddle points -- are universal across all admissible costs, depending solely on the overparameterized representation rather than on problem-specific details. Moreover, we show that convergence can be arbitrarily accelerated depending on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Machine Learning in Materials Science