Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
Dong-Young Lim, Sotirios Sabanis

TL;DR
This paper introduces a new Langevin-based optimization algorithm, TheoPouLa, that offers improved stability and convergence for neural network training, addressing issues like vanishing gradients and outperforming existing adaptive optimizers.
Contribution
The paper develops a novel Langevin-based algorithm with theoretical convergence guarantees, leveraging Euler polygonal approximations for SDEs, and demonstrates its superior performance in deep learning tasks.
Findings
TheoPouLa outperforms popular adaptive optimizers in experiments.
Theoretical analysis confirms convergence and stability.
Addresses vanishing gradient issues in neural networks.
Abstract
We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler's polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in neural networks. In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named THO POULA (or, simply, TheoPouLa). Finally, several experiments are presented with different types of deep learning models, which show the superior performance of TheoPouLa over many popular adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
