On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms
Lam M. Nguyen, Trang H. Tran

TL;DR
This paper proves that shuffling SGD, a practical variant of stochastic gradient descent, converges to a global solution for certain non-convex functions in over-parameterized models, under relaxed assumptions.
Contribution
It provides the first convergence analysis of shuffling SGD for non-convex functions with relaxed assumptions, matching the complexity of convex cases.
Findings
Convergence to global solutions under over-parameterization
Relaxed non-convex assumptions compared to prior work
Maintains computational complexity similar to convex settings
Abstract
Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings. Our analysis employs more relaxed non-convex assumptions than previous literature. Nevertheless, we maintain the desired computational complexity as shuffling SGD has achieved in the general convex setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
