Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks

Keigo Nishida; Eren Mehmet K{\i}ral; Kenichi Bannai; Mohammad Emtiyaz Khan; Thomas M\"ollenhoff

arXiv:2506.17768·cs.LG·June 24, 2025

Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks

Keigo Nishida, Eren Mehmet K{\i}ral, Kenichi Bannai, Mohammad Emtiyaz Khan, Thomas M\"ollenhoff

PDF

TL;DR

This paper introduces Log-Normal Multiplicative Dynamics (LMD), a biologically inspired training algorithm that enables stable, accurate low-precision training of large neural networks like Vision Transformer and GPT-2.

Contribution

It derives a Bayesian learning rule leading to a new multiplicative update algorithm inspired by biological synapses, improving low-precision training stability.

Findings

01

LMD achieves stable training with low-precision computations.

02

LMD performs well on Vision Transformer and GPT-2.

03

The method is simple to implement, comparable to Adam.

Abstract

Studies in neuroscience have shown that biological synapses follow a log-normal distribution whose transitioning can be explained by noisy multiplicative dynamics. Biological networks can function stably even under dynamically fluctuating conditions arising due to unreliable synaptic transmissions. Here we ask: Is it possible to design similar multiplicative training in artificial neural networks? To answer this question, we derive a Bayesian learning rule that assumes log-normal posterior distributions over weights which gives rise to a new Log-Normal Multiplicative Dynamics (LMD) algorithm. The algorithm uses multiplicative updates with both noise and regularization applied multiplicatively. The method is as easy to implement as Adam and only requires one additional vector to store. Our results show that LMD achieves stable and accurate training-from-scratch under low-precision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.