Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient   adaptive algorithms for neural networks

Dong-Young Lim; Sotirios Sabanis

arXiv:2105.13937·cs.LG·March 5, 2024·1 cites

Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks

Dong-Young Lim, Sotirios Sabanis

PDF

Open Access 2 Repos

TL;DR

This paper introduces a new Langevin-based optimization algorithm, TheoPouLa, that offers improved stability and convergence for neural network training, addressing issues like vanishing gradients and outperforming existing adaptive optimizers.

Contribution

The paper develops a novel Langevin-based algorithm with theoretical convergence guarantees, leveraging Euler polygonal approximations for SDEs, and demonstrates its superior performance in deep learning tasks.

Findings

01

TheoPouLa outperforms popular adaptive optimizers in experiments.

02

Theoretical analysis confirms convergence and stability.

03

Addresses vanishing gradient issues in neural networks.

Abstract

We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler's polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in neural networks. In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH $ε$ O POULA (or, simply, TheoPouLa). Finally, several experiments are presented with different types of deep learning models, which show the superior performance of TheoPouLa over many popular adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Neural Networks and Applications