Stochastic Gradient Descent with Biased but Consistent Gradient Estimators
Jie Chen, Ronny Luss

TL;DR
This paper demonstrates that stochastic gradient descent with biased but consistent gradient estimators converges similarly to unbiased estimators across various convex and nonconvex optimization problems, broadening its applicability.
Contribution
It provides a theoretical analysis showing consistent estimators can replace unbiased ones in SGD without loss of convergence, applicable to complex scenarios like graph data.
Findings
Consistent estimators achieve similar convergence as unbiased estimators.
Experimental validation on synthetic and real data supports the theory.
Open new research directions for efficient SGD in large-scale graph settings.
Abstract
Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss functions and training nonconvex deep neural networks. The theory assumes that one can easily compute an unbiased gradient estimator, which is usually the case due to the sample average nature of empirical risk minimization. There exist, however, many scenarios (e.g., graphs) where an unbiased estimator may be as expensive to compute as the full gradient because training examples are interconnected. Recently, Chen et al. (2018) proposed using a consistent gradient estimator as an economic alternative. Encouraged by empirical success, we show, in a general setting, that consistent estimators result in the same convergence behavior as do unbiased ones.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
MethodsStochastic Gradient Descent
