Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging

Stanley Wei; Alex Damian; Jason D. Lee

arXiv:2603.06028·cs.LG·March 9, 2026

Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging

Stanley Wei, Alex Damian, Jason D. Lee

PDF

Open Access 3 Reviews

TL;DR

This paper demonstrates that Langevin dynamics with iterate averaging can recover hidden signals in high-dimensional models at optimal sample complexity, emulating landscape smoothing effects without explicit smoothing.

Contribution

It shows Langevin dynamics with average iterates achieves optimal sample complexity in high-dimensional estimation, matching smoothed landscape results without explicit smoothing.

Findings

01

Langevin dynamics succeed with n ~ d^{k*/2} samples using average iterates.

02

Iterate averaging combined with noise mimics landscape smoothing effects.

03

Potential extension that minibatch SGD may also reach similar rates without added noise.

Abstract

Significant recent work has studied the ability of gradient descent to recover a hidden planted direction $θ^{⋆} \in S^{d - 1}$ in different high-dimensional settings, including tensor PCA and single-index models. The key quantity that governs the ability of gradient descent to traverse these landscapes is the information exponent $k^{⋆}$ (Ben Arous et al., (2021)), which corresponds to the order of the saddle at initialization in the population landscape. Ben Arous et al., (2021) showed that $n ≳ d^{m a x (1, k^{⋆} - 1)}$ samples were necessary and sufficient for online SGD to recover $θ^{⋆}$ , and Ben Arous et al., (2020) proved a similar lower bound for Langevin dynamics. More recently, Damian et al., (2023) showed it was possible to circumvent these lower bounds by running gradient descent on a smoothed landscape, and that this algorithm succeeds with $n \gtrsim…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

The paper addresses the problem of learning the implanted direction with a Langevin algorithm, even though this was previously considered inefficient and shows that it is not. The paper nicely explains that it does not solve the problem of escaping from the "equator" for individual samples, but only once the averaging over the samples is performed. It also demonstrates this with numerical simulations.

Weaknesses

It is not clear what are the implications of this work and what should a practitioner gather from these findings. As far as I understand, Langevin dynamics only boosts the performance when the information exponent k is larger than 2 (otherwise max(1, k-1) = 1 and so is max(1, k/2)). It is not clear to me how relevant is the k > 2 case, especially if the single index model is viewed as a toy model of the neural network. In Example 1, k = 1 and k = 2 cases are often encountered, while cases with k

Reviewer 02Rating 6Confidence 4

Strengths

The paper conveys an interesting check about the possibility to dynamically smooth the landscape in a number of hard inference problems. It is robust in terms of derivations and conclusions.

Weaknesses

Not sure that it is relevant to Machine Learning applications where landscape is expected to be way smoother already thanks to overparametrization and large datasets. The paper is not very innovative both concerning ideas and execution. I do not need to specify further the very rich literature on the same subject already cited in the paper and in other parts of this review. Also little parts of the proof are contained in other papers (as explicitly mentioned in the text). I suggest the authors

Reviewer 03Rating 6Confidence 3

Strengths

The idea that averaging iterates can replace explicit loss smoothing to recover a planted signal with optimal sample complexity in the online regime is novel and conceptually appealing. The paper provides a clear theoretical treatment of this idea in the context of Langevin dynamics on the sphere, supported by sound derivations and proofs.

Weaknesses

(1) The analysis is limited to the continuous-time setting, and it remains unclear how a discretized version of the proposed dynamics would perform in practice or whether it would preserve the same sample complexity guarantees. (2) While the idea of using iterate averaging could potentially lead to simplified algorithms, the specific procedure analyzed in this work is not practical. It requires oracle knowledge of the information exponent $k^\star$, and depending on its parity, a different esti

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis