Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms
Attila Lovas, Iosif Lytras, Mikl\'os R\'asonyi, Sotirios Sabanis

TL;DR
This paper introduces TUSLA, a novel adaptive stochastic gradient Langevin algorithm designed to effectively train neural networks with non-convex loss functions, providing finite-time convergence guarantees and outperforming traditional methods.
Contribution
The paper proposes TUSLA, a taming-based variant of SGLD, with theoretical convergence analysis and empirical evidence demonstrating its advantages in non-convex neural network training.
Findings
TUSLA achieves finite-time convergence guarantees for non-convex optimization.
Numerical experiments show TUSLA outperforms vanilla SGLD in neural network training.
Theoretical analysis confirms the effectiveness of taming techniques for superlinear coefficients.
Abstract
Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Neural Networks and Applications
MethodsDiffusion
