PAGE: A Simple and Optimal Probabilistic Gradient Estimator for   Nonconvex Optimization

Zhize Li; Hongyan Bao; Xiangliang Zhang; Peter Richt\'arik

arXiv:2008.10898·cs.LG·June 15, 2021·22 cites

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richt\'arik

PDF

Open Access 1 Video

TL;DR

PAGE introduces a simple, efficient stochastic gradient estimator that achieves optimal convergence rates for nonconvex optimization problems, outperforming traditional methods in both theory and practical deep learning tasks.

Contribution

The paper presents PAGE, a novel gradient estimator with optimal convergence bounds and a simple implementation, along with tight lower bounds for nonconvex optimization.

Findings

01

PAGE achieves optimal convergence rates matching lower bounds.

02

PAGE converges faster than SGD in deep learning experiments.

03

PAGE attains higher test accuracy in practical neural network training.

Abstract

In this paper, we propose a novel stochastic gradient estimator -- ProbAbilistic Gradient Estimator (PAGE) -- for nonconvex optimization. PAGE is easy to implement as it is designed via a small adjustment to vanilla SGD: in each iteration, PAGE uses the vanilla minibatch SGD update with probability $p_{t}$ or reuses the previous gradient with a small adjustment, at a much lower computational cost, with probability $1 - p_{t}$ . We give a simple formula for the optimal choice of $p_{t}$ . Moreover, we prove the first tight lower bound $Ω (n + \frac{n}{ϵ ^{2}})$ for nonconvex finite-sum problems, which also leads to a tight lower bound $Ω (b + \frac{b}{ϵ ^{2}})$ for nonconvex online problems, where $b := min {\frac{σ ^{2}}{ϵ ^{2}}, n}$ . Then, we show that PAGE obtains the optimal convergence results $O (n + \frac{n}{ϵ ^{2}})$ (finite-sum) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms

MethodsConvolution · Dense Connections · Dropout · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Softmax · Ethereum Customer Service Number +1-833-534-1729 · Stochastic Gradient Descent