Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression
Yijin Ni, Xiaoming Huo

TL;DR
This paper improves online covariance estimation for averaged SGD, achieving minimax optimal rates without Hessian access by analyzing trajectory regression and bias components.
Contribution
It introduces a bias-tuned batch-means estimator and a trajectory regression method that attain minimax optimal covariance estimation rates in an online setting.
Findings
Re-tuning block-growth improves convergence rate to O(n^{-(1-eta)/3})
Trajectory regression achieves the minimax rate matching the lower bound
The modified estimator requires no Hessian access and maintains O(d^2) memory
Abstract
We study online covariance matrix estimation for Polyak--Ruppert averaged stochastic gradient descent (SGD). The online batch-means estimator of Zhu, Chen and Wu (2023) achieves an operator-norm convergence rate of , which yields at the optimal learning-rate exponent . A rigorous per-block bias analysis reveals that re-tuning the block-growth parameter improves the batch-means rate to , achieving . The modified estimator requires no Hessian access and preserves memory. We provide a complete error decomposition into variance, stationarity bias, and nonlinearity bias components. A weighted-averaging variant that avoids hard truncation is also discussed. We establish the minimax rate for Hessian-free covariance estimation from the SGD trajectory: a Le Cam…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
