A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu, Hancheng Min, Salma Tarmoun, Enrique Mallada, Rene Vidal

TL;DR
This paper establishes a local Polyak-Lojasiewicz condition and descent lemma for gradient descent on overparameterized linear neural networks, enabling a linear convergence rate under relaxed assumptions.
Contribution
It introduces a novel local analysis of PL and descent conditions for overparameterized models, relaxing traditional assumptions on step size, width, and initialization.
Findings
Proves local PL condition and descent lemma depend on weights and initialization.
Derives a linear convergence rate for GD under relaxed assumptions.
Numerical experiments confirm improved step size choices.
Abstract
Most prior work on the convergence of gradient descent (GD) for overparameterized neural networks relies on strong assumptions on the step size (infinitesimal), the hidden-layer width (infinite), or the initialization (large, spectral, balanced). Recent efforts to relax these assumptions focus on two-layer linear networks trained with the squared loss. In this work, we derive a linear convergence rate for training two-layer linear neural networks with GD for general losses and under relaxed assumptions on the step size, width, and initialization. A key challenge in deriving this result is that classical ingredients for deriving convergence rates for nonconvex problems, such as the Polyak-{\L}ojasiewicz (PL) condition and Descent Lemma, do not hold globally for overparameterized neural networks. Here, we prove that these two conditions hold locally with local constants that depend on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsFocus
