Breaking the Span Assumption Yields Fast Finite-Sum Minimization

Robert Hannah; Yanli Liu; Daniel O'Connor; Wotao Yin

arXiv:1805.07786·math.OC·May 22, 2018·NeurIPS·1 cites

Breaking the Span Assumption Yields Fast Finite-Sum Minimization

Robert Hannah, Yanli Liu, Daniel O'Connor, Wotao Yin

PDF

Open Access

TL;DR

This paper demonstrates that by breaking the span assumption, modified SVRG and SARAH algorithms can significantly outperform traditional finite-sum minimization algorithms, especially in large-scale data settings.

Contribution

The paper introduces modifications to SVRG and SARAH that break the span assumption, enabling faster convergence and proving this speedup is theoretically optimal.

Findings

01

Modified SVRG and SARAH are up to (1+(( (n/eta))_+)) times faster.

02

Speedup is (1) in big data regimes where ( ) iterations are sufficient for desired accuracy.

03

Lower bounds confirm the optimality of the proposed speedup.

Abstract

In this paper, we show that SVRG and SARAH can be modified to be fundamentally faster than all of the other standard algorithms that minimize the sum of $n$ smooth functions, such as SAGA, SAG, SDCA, and SDCA without duality. Most finite sum algorithms follow what we call the "span assumption": Their updates are in the span of a sequence of component gradients chosen in a random IID fashion. In the big data regime, where the condition number $κ = O (n)$ , the span assumption prevents algorithms from converging to an approximate solution of accuracy $ϵ$ in less than $n ln (1/ ϵ)$ iterations. SVRG and SARAH do not follow the span assumption since they are updated with a hybrid of full-gradient and component-gradient information. We show that because of this, they can be up to $Ω (1 + (ln (n / κ))_{+})$ times faster. In particular, to obtain an accuracy $\epsilon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Cryptography and Data Security