Bias-Variance Tradeoff in a Sliding Window Implementation of the   Stochastic Gradient Algorithm

Yakup Ceki Papo

arXiv:1910.11868·stat.ML·October 28, 2019·1 cites

Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm

Yakup Ceki Papo

PDF

Open Access

TL;DR

This paper introduces a framework for analyzing stochastic gradient algorithms using asymptotic normality, and proposes the SW-SGD method which reduces mean squared error compared to standard SGD on convex problems.

Contribution

The paper develops a new analysis framework for biased gradient estimators and introduces the SW-SGD algorithm with proven convergence and improved MSE performance.

Findings

01

SW-SGD achieves lower MSE than SGD on quadratic and convex problems.

02

The framework effectively characterizes the distribution of iterates considering bias and covariance.

03

Numerical results confirm the superiority of SW-SGD over traditional SGD.

Abstract

This paper provides a framework to analyze stochastic gradient algorithms in a mean squared error (MSE) sense using the asymptotic normality result of the stochastic gradient descent (SGD) iterates. We perform this analysis by taking the asymptotic normality result and applying it to the finite iteration case. Specifically, we look at problems where the gradient estimators are biased and have reduced variance and compare the iterates generated by these gradient estimators to the iterates generated by the SGD algorithm. We use the work of Fabian to characterize the mean and the variance of the distribution of the iterates in terms of the bias and the covariance matrix of the gradient estimators. We introduce the sliding window SGD (SW-SGD) algorithm, with its proof of convergence, which incurs a lower MSE than the SGD algorithm on quadratic and convex problems. Lastly, we present some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent