SGD with Dependent Data: Optimal Estimation, Regret, and Inference

Yinan Shen; Yichen Zhang; Wen-Xin Zhou

arXiv:2601.01371·math.ST·January 6, 2026

SGD with Dependent Data: Optimal Estimation, Regret, and Inference

Yinan Shen, Yichen Zhang, Wen-Xin Zhou

PDF

Open Access

TL;DR

This paper analyzes the performance of stochastic gradient descent (SGD) with dependent data, establishing optimal estimation, regret bounds, and asymptotic normality, even under complex dependence structures and unbounded covariates.

Contribution

It introduces a comprehensive analysis of SGD under dependent data, extending classical results to non-stationary, non-mixing, and decision-dependent scenarios, with new algorithms for sparse regression.

Findings

01

SGD achieves optimal estimation error and regret under dependent data.

02

Tail bounds remain sharp for infinite horizon settings.

03

Asymptotic distribution of SGD iterates is Gaussian with an $O(1/\sqrt{t})$ remainder.

Abstract

This work investigates the performance of the final iterate produced by stochastic gradient descent (SGD) under temporally dependent data. We consider two complementary sources of dependence: $(i)$ martingale-type dependence in both the covariate and noise processes, which accommodates non-stationary and non-mixing time series data, and $(ii)$ dependence induced by sequential decision making. Our formulation runs in parallel with classical notions of (local) stationarity and strong mixing, while neither framework fully subsumes the other. Remarkably, SGD is shown to automatically accommodate both independent and dependent information under a broad class of stepsize schedules and exploration rate schemes. Non-asymptotically, we show that SGD simultaneously achieves statistically optimal estimation error and regret, extending and improving existing results. In particular, our tail…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research