SGD with Dependent Data: Optimal Estimation, Regret, and Inference
Yinan Shen, Yichen Zhang, Wen-Xin Zhou

TL;DR
This paper analyzes the performance of stochastic gradient descent (SGD) with dependent data, establishing optimal estimation, regret bounds, and asymptotic normality, even under complex dependence structures and unbounded covariates.
Contribution
It introduces a comprehensive analysis of SGD under dependent data, extending classical results to non-stationary, non-mixing, and decision-dependent scenarios, with new algorithms for sparse regression.
Findings
SGD achieves optimal estimation error and regret under dependent data.
Tail bounds remain sharp for infinite horizon settings.
Asymptotic distribution of SGD iterates is Gaussian with an $O(1/\sqrt{t})$ remainder.
Abstract
This work investigates the performance of the final iterate produced by stochastic gradient descent (SGD) under temporally dependent data. We consider two complementary sources of dependence: martingale-type dependence in both the covariate and noise processes, which accommodates non-stationary and non-mixing time series data, and dependence induced by sequential decision making. Our formulation runs in parallel with classical notions of (local) stationarity and strong mixing, while neither framework fully subsumes the other. Remarkably, SGD is shown to automatically accommodate both independent and dependent information under a broad class of stepsize schedules and exploration rate schemes. Non-asymptotically, we show that SGD simultaneously achieves statistically optimal estimation error and regret, extending and improving existing results. In particular, our tail…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research
