Learning Provably Improves the Convergence of Gradient Descent

Qingyu Song; Wei Lin; Hong Xu

arXiv:2501.18092·cs.LG·December 24, 2025

Learning Provably Improves the Convergence of Gradient Descent

Qingyu Song, Wei Lin, Hong Xu

PDF

Open Access 1 Video

TL;DR

This paper proves the convergence of a learned optimizer for quadratic programming, bridging the theoretical gap in Learn to Optimize methods, and demonstrates significant improvements over traditional gradient descent and existing L2O approaches.

Contribution

It provides the first rigorous convergence proof for L2O models learning GD hyperparameters, using NTK theory and a deterministic initialization strategy.

Findings

01

L2O models trained with the proposed method outperform GD by over 50% in optimality.

02

The approach demonstrates superior robustness compared to state-of-the-art L2O methods.

03

Theoretical analysis supports stable training over long horizons.

Abstract

Learn to Optimize (L2O) trains deep neural network-based solvers for optimization, achieving success in accelerating convex problems and improving non-convex solutions. However, L2O lacks rigorous theoretical backing for its own training convergence, as existing analyses often use unrealistic assumptions -- a gap this work highlights empirically. We bridge this gap by proving the training convergence of L2O models that learn Gradient Descent (GD) hyperparameters for quadratic programming, leveraging the Neural Tangent Kernel (NTK) theory. We propose a deterministic initialization strategy to support our theoretical results and promote stable training over extended optimization horizons by mitigating gradient explosion. Our L2O framework demonstrates over 50% better optimality than GD and superior robustness over state-of-the-art L2O methods on synthetic datasets. The code of our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Provably Improves the Convergence of Gradient Descent· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsALIGN