Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization
Ali Kavis, Stratis Skoulakis, Kimon Antonakopoulos, Leello Tadesse, Dadi, Volkan Cevher

TL;DR
AdaSpider is a novel adaptive variance-reduction algorithm for non-convex finite-sum optimization that does not require prior knowledge of problem parameters and achieves optimal oracle complexity.
Contribution
It introduces AdaSpider, the first parameter-free non-convex variance-reduction method with optimal complexity bounds.
Findings
Achieves $ ilde{O}(n + rac{ oot{n}}{ ext{epsilon}^2})$ oracle calls for $ ext{epsilon}$-stationary points.
Does not require knowledge of smoothness constant, target accuracy, or gradient bounds.
Matches the lower bound complexity up to logarithmic factors.
Abstract
We propose an adaptive variance-reduction method, called AdaSpider, for minimization of -smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider combines an AdaGrad-inspired [Duchi et al., 2011, McMahan & Streeter, 2010], but a fairly distinct, adaptive step-size schedule with the recursive stochastic path integrated estimator proposed in [Fang et al., 2018]. To our knowledge, Adaspider is the first parameter-free non-convex variance-reduction method in the sense that it does not require the knowledge of problem-dependent parameters, such as smoothness constant , target accuracy or any bound on gradient norms. In doing so, we are able to compute an -stationary point with oracle-calls, which matches the respective lower bound up to logarithmic factors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms
