Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

Reza Ghane; Danil Akhtiamov; Babak Hassibi

arXiv:2603.10485·stat.ML·March 19, 2026

Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

Reza Ghane, Danil Akhtiamov, Babak Hassibi

PDF

Open Access

TL;DR

This paper analyzes the convergence and implicit bias of Dual Space Preconditioned Gradient Descent, including optimizers like Adam, in over-parameterized linear models, introducing novel divergence techniques and characterizing convergence points.

Contribution

It proves convergence of dual space preconditioned gradient descent in over-parameterized models and characterizes the implicit bias for isotropic preconditioners.

Findings

01

Convergence to a solution satisfying XW=Y is guaranteed.

02

Implicit bias for isotropic preconditioners minimizes the Frobenius norm difference from initialization.

03

General preconditioners approximate this bias up to a constant factor.

Abstract

In this work we study the convergence properties of the Dual Space Preconditioned Gradient Descent, encompassing optimizers such as Normalized Gradient Descent, Gradient Clipping and Adam. We consider preconditioners of the form $\nabla K$ , where $K : R^{p} \to R$ is convex and assume that the latter is applied to train an over-parameterized linear model with loss of the form $ℓ (X W - Y)$ , for weights $W \in R^{d \times k}$ , labels $Y \in R^{n \times k}$ and data $X \in R^{n \times d}$ . Under the aforementioned assumptions, we prove that the iterates of the preconditioned gradient descent always converge to a point $W_{\infty} \in R^{d \times k}$ satisfying $X W_{\infty} = Y$ . Our proof techniques are of independent interest as we introduce a novel version of the Bregman Divergence with accompanying identities that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques