Nonconvex Stochastic Bregman Proximal Gradient Method with Application   to Deep Learning

Kuangyu Ding; Jingyang Li; Kim-Chuan Toh

arXiv:2306.14522·math.OC·January 22, 2025·2 cites

Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

Kuangyu Ding, Jingyang Li, Kim-Chuan Toh

PDF

Open Access

TL;DR

This paper introduces stochastic Bregman proximal gradient methods that handle non-Lipschitz gradients in nonconvex optimization, demonstrating robustness and efficiency in deep learning and inverse problems.

Contribution

It proposes SBPG and MSBPG algorithms that relax Lipschitz smoothness assumptions, with proven convergence and practical effectiveness in neural network training.

Findings

01

SBPG achieves optimal sample complexity in nonconvex settings.

02

MSBPG relaxes mini-batch size requirements while maintaining convergence.

03

Experimental results show robustness and effectiveness in deep learning applications.

Abstract

Stochastic gradient methods for minimizing nonconvex composite objective functions typically rely on the Lipschitz smoothness of the differentiable part, but this assumption fails in many important problem classes like quadratic inverse problems and neural network training, leading to instability of the algorithms in both theory and practice. To address this, we propose a family of stochastic Bregman proximal gradient (SBPG) methods that only require smooth adaptivity. SBPG replaces the quadratic approximation in SGD with a Bregman proximity measure, offering a better approximation model that handles non-Lipschitz gradients in nonconvex objectives. We establish the convergence properties of vanilla SBPG and show it achieves optimal sample complexity in the nonconvex setting. Experimental results on quadratic inverse problems demonstrate SBPG's robustness in terms of stepsize selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent