A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models

Ziqing Xu; Hancheng Min; Salma Tarmoun; Enrique Mallada; Rene Vidal

arXiv:2505.11664·cs.LG·May 20, 2025

A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models

Ziqing Xu, Hancheng Min, Salma Tarmoun, Enrique Mallada, Rene Vidal

PDF

Open Access

TL;DR

This paper establishes a local Polyak-Lojasiewicz condition and descent lemma for gradient descent on overparameterized linear neural networks, enabling a linear convergence rate under relaxed assumptions.

Contribution

It introduces a novel local analysis of PL and descent conditions for overparameterized models, relaxing traditional assumptions on step size, width, and initialization.

Findings

01

Proves local PL condition and descent lemma depend on weights and initialization.

02

Derives a linear convergence rate for GD under relaxed assumptions.

03

Numerical experiments confirm improved step size choices.

Abstract

Most prior work on the convergence of gradient descent (GD) for overparameterized neural networks relies on strong assumptions on the step size (infinitesimal), the hidden-layer width (infinite), or the initialization (large, spectral, balanced). Recent efforts to relax these assumptions focus on two-layer linear networks trained with the squared loss. In this work, we derive a linear convergence rate for training two-layer linear neural networks with GD for general losses and under relaxed assumptions on the step size, width, and initialization. A key challenge in deriving this result is that classical ingredients for deriving convergence rates for nonconvex problems, such as the Polyak-{\L}ojasiewicz (PL) condition and Descent Lemma, do not hold globally for overparameterized neural networks. Here, we prove that these two conditions hold locally with local constants that depend on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning

MethodsFocus