Adaptive Loss Scaling for Mixed Precision Training
Ruizhe Zhao, Brian Vogel, Tanvir Ahmed

TL;DR
This paper proposes an adaptive loss scaling method for mixed precision training that automatically adjusts layer-wise loss scales during training, eliminating the need for manual hyperparameter tuning and enhancing training efficiency and accuracy.
Contribution
It introduces a novel adaptive loss scaling technique with layer-wise adjustments, improving mixed precision training's practicality and performance.
Findings
Reduces training time to convergence
Improves model accuracy
Eliminates hyperparameter tuning for loss scale
Abstract
Mixed precision training (MPT) is becoming a practical technique to improve the speed and energy efficiency of training deep neural networks by leveraging the fast hardware support for IEEE half-precision floating point that is available in existing GPUs. MPT is typically used in combination with a technique called loss scaling, that works by scaling up the loss value up before the start of backpropagation in order to minimize the impact of numerical underflow on training. Unfortunately, existing methods make this loss scale value a hyperparameter that needs to be tuned per-model, and a single scale cannot be adapted to different layers at different training stages. We introduce a loss scaling-based training method called adaptive loss scaling that makes MPT easier and more practical to use, by removing the need to tune a model-specific loss scale hyperparameter. We achieve this by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adaptive Robust Loss
