SGD with Variance Reduction beyond Empirical Risk Minimization
Massil Achab (CMAP), Agathe Guilloux (LSTA), St\'ephane Ga\"iffas, (CMAP), Emmanuel Bacry (CMAP)

TL;DR
This paper presents a novel doubly stochastic proximal gradient algorithm that accelerates the optimization of complex models involving expensive expectations, notably improving survival analysis methods like Cox partial-likelihood.
Contribution
The paper introduces a new algorithm combining stochastic gradient descent with variance reduction and MCMC-based expectation approximation, applicable beyond traditional empirical risk minimization.
Findings
Achieves linear convergence under strong convexity.
Improves state-of-the-art in regularized Cox partial-likelihood optimization.
Provides convergence guarantees with MCMC iteration conditions.
Abstract
We introduce a doubly stochastic proximal gradient algorithm for optimizing a finite average of smooth convex functions, whose gradients depend on numerically expensive expectations. Our main motivation is the acceleration of the optimization of the regularized Cox partial-likelihood (the core model used in survival analysis), but our algorithm can be used in different settings as well. The proposed algorithm is doubly stochastic in the sense that gradient steps are done using stochastic gradient descent (SGD) with variance reduction, where the inner expectations are approximated by a Monte-Carlo Markov-Chain (MCMC) algorithm. We derive conditions on the MCMC number of iterations guaranteeing convergence, and obtain a linear rate of convergence under strong convexity and a sublinear rate without this assumption. We illustrate the fact that our algorithm improves the state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Sparse and Compressive Sensing Techniques
