PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization
Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richt\'arik

TL;DR
PAGE introduces a simple, efficient stochastic gradient estimator that achieves optimal convergence rates for nonconvex optimization problems, outperforming traditional methods in both theory and practical deep learning tasks.
Contribution
The paper presents PAGE, a novel gradient estimator with optimal convergence bounds and a simple implementation, along with tight lower bounds for nonconvex optimization.
Findings
PAGE achieves optimal convergence rates matching lower bounds.
PAGE converges faster than SGD in deep learning experiments.
PAGE attains higher test accuracy in practical neural network training.
Abstract
In this paper, we propose a novel stochastic gradient estimator -- ProbAbilistic Gradient Estimator (PAGE) -- for nonconvex optimization. PAGE is easy to implement as it is designed via a small adjustment to vanilla SGD: in each iteration, PAGE uses the vanilla minibatch SGD update with probability or reuses the previous gradient with a small adjustment, at a much lower computational cost, with probability . We give a simple formula for the optimal choice of . Moreover, we prove the first tight lower bound for nonconvex finite-sum problems, which also leads to a tight lower bound for nonconvex online problems, where . Then, we show that PAGE obtains the optimal convergence results (finite-sum) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms
MethodsConvolution · Dense Connections · Dropout · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Softmax · Ethereum Customer Service Number +1-833-534-1729 · Stochastic Gradient Descent
