Taming neural networks with TUSLA: Non-convex learning via adaptive   stochastic gradient Langevin algorithms

Attila Lovas; Iosif Lytras; Mikl\'os R\'asonyi; Sotirios Sabanis

arXiv:2006.14514·cs.LG·January 18, 2023·5 cites

Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms

Attila Lovas, Iosif Lytras, Mikl\'os R\'asonyi, Sotirios Sabanis

PDF

Open Access

TL;DR

This paper introduces TUSLA, a novel adaptive stochastic gradient Langevin algorithm designed to effectively train neural networks with non-convex loss functions, providing finite-time convergence guarantees and outperforming traditional methods.

Contribution

The paper proposes TUSLA, a taming-based variant of SGLD, with theoretical convergence analysis and empirical evidence demonstrating its advantages in non-convex neural network training.

Findings

01

TUSLA achieves finite-time convergence guarantees for non-convex optimization.

02

Numerical experiments show TUSLA outperforms vanilla SGLD in neural network training.

03

Theoretical analysis confirms the effectiveness of taming techniques for superlinear coefficients.

Abstract

Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Neural Networks and Applications

MethodsDiffusion