Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates

Saumya Goyal; Rohith Rongali; Ritabrata Ray; Barnab\'as P\'oczos

arXiv:2604.13130·cs.LG·April 16, 2026

Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates

Saumya Goyal, Rohith Rongali, Ritabrata Ray, Barnab\'as P\'oczos

PDF

TL;DR

This paper introduces LGD, a Langevin-based gradient descent method for hyperparameter tuning in convex regression, providing theoretical guarantees and empirical validation for few-shot learning.

Contribution

It proposes LGD, a novel algorithm with proven optimality and generalization bounds, extending prior work to convex regression and hyperparameter dimensions.

Findings

01

LGD achieves Bayes' optimal solution for squared loss.

02

Meta-learning hyperparameters with LGD has a pseudo-dimension bound of O(dh).

03

Empirical results show LGD's effectiveness in few-shot linear regression.

Abstract

We study learning to learn for regression problems through the lens of hyperparameter tuning. We propose the Langevin Gradient Descent Algorithm (LGD), which approximates the mean of the posterior distribution defined by the loss function and regularizer of a convex regression task. We prove the existence of an optimal hyperparameter configuration for which the LGD algorithm achieves the Bayes' optimal solution for squared loss. Subsequently, we study generalization guarantees on meta-learning optimal hyperparameters for the LGD algorithm from a given set of tasks in the data-driven setting. For a number of parameters $d$ and hyperparameter dimension $h$ , we show a pseudo-dimension bound of $O (d h)$ , upto logarithmic terms under mild assumptions on LGD. This matches the dimensional dependence of the bounds obtained in prior work for the elastic net, which only allows for $h = 2$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.