SAAGs: Biased Stochastic Variance Reduction Methods for Large-scale   Learning

Vinod Kumar Chauhan; Anuj Sharma; Kalpana Dahiya

arXiv:1807.08934·cs.LG·April 9, 2019

SAAGs: Biased Stochastic Variance Reduction Methods for Large-scale Learning

Vinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya

PDF

TL;DR

This paper introduces novel biased stochastic variance reduction methods, SAAG-III and IV, for large-scale learning, utilizing a new step size strategy and extending to non-smooth problems with proven linear convergence.

Contribution

The paper proposes two new variants, SAAG-III and IV, with novel initialization and step size strategies, extending variance reduction methods to non-smooth problems and proving their convergence.

Findings

01

SAAG-III and IV outperform existing methods in experiments.

02

Theoretical proof of linear convergence for SAAG-IV.

03

Effective in large-scale and non-smooth optimization tasks.

Abstract

Stochastic approximation is one of the effective approach to deal with the large-scale machine learning problems and the recent research has focused on reduction of variance, caused by the noisy approximations of the gradients. In this paper, we have proposed novel variants of SAAG-I and II (Stochastic Average Adjusted Gradient) (Chauhan et al. 2017), called SAAG-III and IV, respectively. Unlike SAAG-I, starting point is set to average of previous epoch in SAAG-III, and unlike SAAG-II, the snap point and starting point are set to average and last iterate of previous epoch in SAAG-IV, respectively. To determine the step size, we have used Stochastic Backtracking-Armijo line Search (SBAS) which performs line search only on selected mini-batch of data points. Since backtracking line search is not suitable for large-scale problems and the constants used to find the step size, like Lipschitz…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.