Variance Reduction via Accelerated Dual Averaging for Finite-Sum   Optimization

Chaobing Song; Yong Jiang; Yi Ma

arXiv:2006.10281·math.OC·March 9, 2021·6 cites

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

Chaobing Song, Yong Jiang, Yi Ma

PDF

Open Access 1 Video

TL;DR

This paper introduces VRADA, a simplified variance reduction method for finite-sum convex optimization, achieving near-optimal convergence rates with improved efficiency and a unified approach for both convex and strongly convex problems.

Contribution

VRADA is a new unified algorithm that improves convergence rates for finite-sum convex optimization and simplifies implementation and analysis.

Findings

01

Achieves $O(n\,\log\log n)$ gradient evaluations for $O(1/n)$ accuracy.

02

Matches lower bounds up to a $\log\log n$ factor in convex settings.

03

Demonstrates superior performance on real datasets.

Abstract

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In both general convex and strongly convex settings, VRADA can attain an $O (\frac{1}{n})$ -accurate solution in $O (n lo g lo g n)$ number of stochastic gradient evaluations which improves the best-known result $O (n lo g n)$ , where $n$ is the number of samples. Meanwhile, VRADA matches the lower bound of the general convex setting up to a $lo g lo g n$ factor and matches the lower bounds in both regimes $n \leq Θ (κ)$ and $n ≫ κ$ of the strongly convex setting, where $κ$ denotes the condition number. Besides improving the best-known results and matching all the above lower bounds simultaneously, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms