Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization
Yangyang Xu, Yibo Xu

TL;DR
The paper introduces PStorm, a momentum-based variance-reduced stochastic gradient method for nonconvex nonsmooth problems, achieving optimal complexity with minimal sample use per iteration, suitable for online learning and large-scale applications.
Contribution
PStorm is a novel stochastic gradient method that attains optimal complexity using only one or O(1) samples per update, unlike existing methods.
Findings
PStorm achieves the optimal $O( ext{}\varepsilon^{-3})$ complexity for nonconvex nonsmooth problems.
PStorm performs well in online learning scenarios with real-time decision requirements.
PStorm demonstrates better generalization with small-batch training in large-scale neural network tasks.
Abstract
Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result to produce a stochastic -stationary solution, if a mean-squared smoothness condition holds. Different from existing optimal methods, PStorm can achieve the result by using only one or samples in every update. With this property, PStorm can be applied to online learning problems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
