Analysis of nonsmooth stochastic approximation: the differential inclusion approach
Szymon Majewski, B{\l}a\.zej Miasojedow, Eric Moulines

TL;DR
This paper extends the convergence analysis of stochastic approximation methods to nonsmooth, non-convex problems by using a differential inclusion framework, applicable to algorithms like stochastic subgradient and proximal stochastic gradient descent.
Contribution
It adapts the mean-limit approach to nonsmooth problems, providing a unified convergence framework for various stochastic approximation algorithms.
Findings
Established convergence of stochastic subgradient methods in nonsmooth settings.
Extended differential inclusion approach to constrained and unconstrained problems.
Applicable to deep learning and high-dimensional inference with sparsity penalties.
Abstract
In this paper we address the convergence of stochastic approximation when the functions to be minimized are not convex and nonsmooth. We show that the "mean-limit" approach to the convergence which leads, for smooth problems, to the ODE approach can be adapted to the non-smooth case. The limiting dynamical system may be shown to be, under appropriate assumption, a differential inclusion. Our results expand earlier works in this direction by Benaim et al. (2005) and provide a general framework for proving convergence for unconstrained and constrained stochastic approximation problems, with either explicit or implicit updates. In particular, our results allow us to establish the convergence of stochastic subgradient and proximal stochastic gradient descent algorithms arising in a large class of deep learning and high-dimensional statistical inference with sparsity inducing penalties.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques
