GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep   Multitask Networks

Zhao Chen; Vijay Badrinarayanan; Chen-Yu Lee; Andrew Rabinovich

arXiv:1711.02257·cs.CV·July 16, 2018·452 cites

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, Andrew Rabinovich

PDF

Open Access 4 Repos

TL;DR

GradNorm is a gradient normalization technique that adaptively balances multitask neural network training, improving accuracy and reducing overfitting across various tasks with minimal hyperparameter tuning.

Contribution

Introduces GradNorm, a simple yet effective gradient normalization method for automatic loss balancing in deep multitask networks, outperforming static and grid search approaches.

Findings

01

GradNorm improves accuracy across multiple tasks.

02

It reduces overfitting compared to baseline methods.

03

It matches or surpasses exhaustive grid search performance.

Abstract

Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $α$ . Thus, what was once a tedious search process that incurred…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings