A Stochastic Proximal Method for Nonsmooth Regularized Finite Sum Optimization
Dounia Lakhmiri, Dominique Orban, Andrea Lodi

TL;DR
This paper introduces SR2, a new stochastic proximal method for training neural networks with nonsmooth regularization, achieving better sparsity and accuracy without needing Lipschitz constant knowledge.
Contribution
The paper proposes SR2, a novel adaptive quadratic regularization-based stochastic proximal algorithm with proven convergence and complexity for nonsmooth regularized optimization.
Findings
SR2 outperforms ProxGEN and ProxSGD in sparsity and accuracy on CIFAR datasets.
Achieves a worst-case iteration complexity of O(ε^{-2}) without Lipschitz constant knowledge.
Convergence guarantees for the stationarity measure under certain conditions.
Abstract
We consider the problem of training a deep neural network with nonsmooth regularization to retrieve a sparse and efficient sub-structure. Our regularizer is only assumed to be lower semi-continuous and prox-bounded. We combine an adaptive quadratic regularization approach with proximal stochastic gradient principles to derive a new solver, called SR2, whose convergence and worst-case complexity are established without knowledge or approximation of the gradient's Lipschitz constant. We formulate a stopping criteria that ensures an appropriate first-order stationarity measure converges to zero under certain conditions. We establish a worst-case iteration complexity of that matches those of related methods like ProxGEN, where the learning rate is assumed to be related to the Lipschitz constant. Our experiments on network instances trained on CIFAR-10 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
