Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuangyu Ding, Jingyang Li, Kim-Chuan Toh

TL;DR
This paper introduces stochastic Bregman proximal gradient methods that handle non-Lipschitz gradients in nonconvex optimization, demonstrating robustness and efficiency in deep learning and inverse problems.
Contribution
It proposes SBPG and MSBPG algorithms that relax Lipschitz smoothness assumptions, with proven convergence and practical effectiveness in neural network training.
Findings
SBPG achieves optimal sample complexity in nonconvex settings.
MSBPG relaxes mini-batch size requirements while maintaining convergence.
Experimental results show robustness and effectiveness in deep learning applications.
Abstract
Stochastic gradient methods for minimizing nonconvex composite objective functions typically rely on the Lipschitz smoothness of the differentiable part, but this assumption fails in many important problem classes like quadratic inverse problems and neural network training, leading to instability of the algorithms in both theory and practice. To address this, we propose a family of stochastic Bregman proximal gradient (SBPG) methods that only require smooth adaptivity. SBPG replaces the quadratic approximation in SGD with a Bregman proximity measure, offering a better approximation model that handles non-Lipschitz gradients in nonconvex objectives. We establish the convergence properties of vanilla SBPG and show it achieves optimal sample complexity in the nonconvex setting. Experimental results on quadratic inverse problems demonstrate SBPG's robustness in terms of stepsize selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
