Improving the Adaptive Moment Estimation (ADAM) stochastic optimizer through an Implicit-Explicit (IMEX) time-stepping approach
Abhinab Bhattacharjee, Andrey A. Popov, Arash Sarshar, Adrian Sandu

TL;DR
This paper reinterprets Adam as an IMEX Euler discretization of an underlying ODE and introduces higher-order IMEX methods to improve neural network training performance.
Contribution
It presents a novel perspective of Adam as an IMEX scheme and develops higher-order IMEX-based optimizers that outperform classical Adam.
Findings
Higher-order IMEX methods improve training results.
New algorithms outperform classical Adam on regression tasks.
Enhanced convergence properties observed in experiments.
Abstract
The Adam optimizer, often used in Machine Learning for neural network training, corresponds to an underlying ordinary differential equation (ODE) in the limit of very small learning rates. This work shows that the classical Adam algorithm is a first-order implicit-explicit (IMEX) Euler discretization of the underlying ODE. Employing the time discretization point of view, we propose new extensions of the Adam scheme obtained by using higher-order IMEX methods to solve the ODE. Based on this approach, we derive a new optimization algorithm for neural network training that performs better than classical Adam on several regression and classification problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsAdam
