Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm
Yakup Ceki Papo

TL;DR
This paper introduces a framework for analyzing stochastic gradient algorithms using asymptotic normality, and proposes the SW-SGD method which reduces mean squared error compared to standard SGD on convex problems.
Contribution
The paper develops a new analysis framework for biased gradient estimators and introduces the SW-SGD algorithm with proven convergence and improved MSE performance.
Findings
SW-SGD achieves lower MSE than SGD on quadratic and convex problems.
The framework effectively characterizes the distribution of iterates considering bias and covariance.
Numerical results confirm the superiority of SW-SGD over traditional SGD.
Abstract
This paper provides a framework to analyze stochastic gradient algorithms in a mean squared error (MSE) sense using the asymptotic normality result of the stochastic gradient descent (SGD) iterates. We perform this analysis by taking the asymptotic normality result and applying it to the finite iteration case. Specifically, we look at problems where the gradient estimators are biased and have reduced variance and compare the iterates generated by these gradient estimators to the iterates generated by the SGD algorithm. We use the work of Fabian to characterize the mean and the variance of the distribution of the iterates in terms of the bias and the covariance matrix of the gradient estimators. We introduce the sliding window SGD (SW-SGD) algorithm, with its proof of convergence, which incurs a lower MSE than the SGD algorithm on quadratic and convex problems. Lastly, we present some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
