TL;DR
VRAdam is a physics-inspired optimizer that stabilizes training by dynamically damping weight updates, leading to improved performance over standard optimizers like AdamW across diverse deep learning tasks.
Contribution
Introduces VRAdam, a novel optimizer combining velocity-based regularization with Adam, supported by theoretical analysis and extensive empirical benchmarking.
Findings
VRAdam reduces oscillations and accelerates convergence.
VRAdam outperforms AdamW on multiple tasks.
Theoretical convergence rate of O(ln(N)/√N) established.
Abstract
We introduce Velocity-Regularized Adam (VRAdam), a physics-inspired optimizer for training deep neural networks that draws on ideas from quartic terms for kinetic energy with its stabilizing effects on various system dynamics. Previous algorithms, including the ubiquitous Adam, operate at the so-called adaptive edge of stability regime during training, leading to rapid oscillations and slowed convergence of loss. However, VRAdam adds a higher order penalty on the learning rate based on the velocity such that the algorithm automatically slows down whenever weight updates become large. In practice, we observe that the effective dynamic learning rate shrinks in high-velocity regimes, and damping oscillations. By combining this velocity-based regularizer for global damping with per-parameter scaling of Adam, we create a powerful hybrid optimizer. For this optimizer, we provide rigorous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
