On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks
Arthur Castello Branco de Oliveira, Dhruv Jatkar, Eduardo Sontag

TL;DR
This paper explores how the compositional structure of neural networks influences their optimization landscape and training dynamics, revealing universal properties and potential for accelerated convergence in overparameterized models.
Contribution
It provides a theoretical analysis of the convergence properties of overparameterized neural networks with linear activations, highlighting universal landscape features and initialization effects.
Findings
Global convergence can be established for proper, real analytic cost functions.
The geometry of the landscape, including saddle points, is universal across costs.
Initialization can significantly accelerate convergence depending on an imbalance metric.
Abstract
This paper investigates how the compositional structure of neural networks shapes their optimization landscape and training dynamics. We analyze the gradient flow associated with overparameterized optimization problems, which can be interpreted as training a neural network with linear activations. Remarkably, we show that the global convergence properties can be derived for any cost function that is proper and real analytic. We then specialize the analysis to scalar-valued cost functions, where the geometry of the landscape can be fully characterized. In this setting, we demonstrate that key structural features -- such as the location and stability of saddle points -- are universal across all admissible costs, depending solely on the overparameterized representation rather than on problem-specific details. Moreover, we show that convergence can be arbitrarily accelerated depending on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Machine Learning in Materials Science
