Stochastic Gradient Descent with Biased but Consistent Gradient   Estimators

Jie Chen; Ronny Luss

arXiv:1807.11880·cs.LG·December 24, 2019·26 cites

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Jie Chen, Ronny Luss

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that stochastic gradient descent with biased but consistent gradient estimators converges similarly to unbiased estimators across various convex and nonconvex optimization problems, broadening its applicability.

Contribution

It provides a theoretical analysis showing consistent estimators can replace unbiased ones in SGD without loss of convergence, applicable to complex scenarios like graph data.

Findings

01

Consistent estimators achieve similar convergence as unbiased estimators.

02

Experimental validation on synthetic and real data supports the theory.

03

Open new research directions for efficient SGD in large-scale graph settings.

Abstract

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss functions and training nonconvex deep neural networks. The theory assumes that one can easily compute an unbiased gradient estimator, which is usually the case due to the sample average nature of empirical risk minimization. There exist, however, many scenarios (e.g., graphs) where an unbiased estimator may be as expensive to compute as the full gradient because training examples are interconnected. Recently, Chen et al. (2018) proposed using a consistent gradient estimator as an economic alternative. Encouraged by empirical success, we show, in a general setting, that consistent estimators result in the same convergence behavior as do unbiased ones.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiechenjiechen/FastGCN-matlab
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data

MethodsStochastic Gradient Descent