TL;DR
The paper introduces the stochastic average gradient (SAG) method, which accelerates the optimization of finite sums of smooth convex functions by combining stochastic gradient efficiency with improved convergence rates.
Contribution
The SAG method is a novel algorithm that achieves faster convergence than traditional stochastic gradient methods by utilizing memory of past gradients, especially in strongly convex cases.
Findings
SAG improves convergence rate from O(1/√k) to O(1/k) for general convex functions.
In strongly convex cases, SAG achieves linear convergence rate O(p^k).
Numerical experiments show SAG often outperforms existing methods.
Abstract
We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
