Stabilizing Backpropagation in 16-bit Neural Training with Modified Adam Optimizer
Juyoung Yun

TL;DR
This paper investigates numerical instability issues in 16-bit neural network training with Adam, identifies epsilon as a key factor, and proposes modifications to improve stability for reliable low-precision deep learning.
Contribution
It introduces a novel approach that adjusts Adam's epsilon hyperparameter and leverages optimizer updates to enhance 16-bit training stability.
Findings
Adjusted epsilon improves 16-bit training stability
Proposed method enables reliable low-precision neural network training
Provides insights into optimization challenges in low-precision computations
Abstract
In this research, we address critical concerns related to the numerical instability observed in 16-bit computations of machine learning models. Such instability, particularly when employing popular optimization algorithms like Adam, often leads to unstable training of deep neural networks. This not only disrupts the learning process but also poses significant challenges in deploying dependable models in real-world applications. Our investigation identifies the epsilon hyperparameter as the primary source of this instability. A nuanced exploration reveals that subtle adjustments to epsilon within 16-bit computations can enhance the numerical stability of Adam, enabling more stable training of 16-bit neural networks. We propose a novel, dependable approach that leverages updates from the Adam optimizer to bolster the stability of the learning process. Our contributions provide deeper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Numerical Methods and Algorithms · Neural Networks and Applications
MethodsAdam · RMSProp
