Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks
Keigo Nishida, Eren Mehmet K{\i}ral, Kenichi Bannai, Mohammad Emtiyaz Khan, Thomas M\"ollenhoff

TL;DR
This paper introduces Log-Normal Multiplicative Dynamics (LMD), a biologically inspired training algorithm that enables stable, accurate low-precision training of large neural networks like Vision Transformer and GPT-2.
Contribution
It derives a Bayesian learning rule leading to a new multiplicative update algorithm inspired by biological synapses, improving low-precision training stability.
Findings
LMD achieves stable training with low-precision computations.
LMD performs well on Vision Transformer and GPT-2.
The method is simple to implement, comparable to Adam.
Abstract
Studies in neuroscience have shown that biological synapses follow a log-normal distribution whose transitioning can be explained by noisy multiplicative dynamics. Biological networks can function stably even under dynamically fluctuating conditions arising due to unreliable synaptic transmissions. Here we ask: Is it possible to design similar multiplicative training in artificial neural networks? To answer this question, we derive a Bayesian learning rule that assumes log-normal posterior distributions over weights which gives rise to a new Log-Normal Multiplicative Dynamics (LMD) algorithm. The algorithm uses multiplicative updates with both noise and regularization applied multiplicatively. The method is as easy to implement as Adam and only requires one additional vector to store. Our results show that LMD achieves stable and accurate training-from-scratch under low-precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
