GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, Andrew Rabinovich

TL;DR
GradNorm is a gradient normalization technique that adaptively balances multitask neural network training, improving accuracy and reducing overfitting across various tasks with minimal hyperparameter tuning.
Contribution
Introduces GradNorm, a simple yet effective gradient normalization method for automatic loss balancing in deep multitask networks, outperforming static and grid search approaches.
Findings
GradNorm improves accuracy across multiple tasks.
It reduces overfitting compared to baseline methods.
It matches or surpasses exhaustive grid search performance.
Abstract
Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter . Thus, what was once a tedious search process that incurred…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
