Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic   Optimization with Progressive Variance Reduction

Fanhua Shang

arXiv:1704.04966·cs.LG·April 18, 2017·1 cites

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

Fanhua Shang

PDF

Open Access

TL;DR

This paper introduces VR-SGD, a variant of SVRG that uses larger learning rates and different update rules, leading to faster convergence and better performance in stochastic optimization tasks.

Contribution

VR-SGD employs larger step sizes and distinct update rules for smooth and non-smooth problems, improving convergence and practical performance over existing methods.

Findings

01

VR-SGD achieves faster convergence than SVRG and Prox-SVRG.

02

VR-SGD outperforms Katyusha and other stochastic methods in experiments.

03

Theoretical analysis confirms linear convergence for strongly convex problems.

Abstract

In this paper, we propose a simple variant of the original stochastic variance reduction gradient (SVRG), where hereafter we refer to as the variance reduced stochastic gradient descent (VR-SGD). Different from the choices of the snapshot point and starting point in SVRG and its proximal variant, Prox-SVRG, the two vectors of each epoch in VR-SGD are set to the average and last iterate of the previous epoch, respectively. This setting allows us to use much larger learning rates or step sizes than SVRG, e.g., 3/(7L) for VR-SGD vs 1/(10L) for SVRG, and also makes our convergence analysis more challenging. In fact, a larger learning rate enjoyed by VR-SGD means that the variance of its stochastic gradient estimator asymptotically approaches zero more rapidly. Unlike common stochastic methods such as SVRG and proximal stochastic methods such as Prox-SVRG, we design two different update…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning