Regularizing Deep Multi-Task Networks using Orthogonal Gradients
Mihai Suteu, Yike Guo

TL;DR
This paper introduces a novel gradient regularization technique that enforces near orthogonal gradients in multi-task neural networks, reducing task interference and improving performance across various datasets.
Contribution
The paper proposes a new gradient regularization method that encourages orthogonal gradients to mitigate task interference in multi-task learning.
Findings
Low gradient angle variance correlates with better performance.
Regularization implicitly reduces gradient angle variance.
Method achieves competitive results on multiple datasets.
Abstract
Deep neural networks are a promising approach towards multi-task learning because of their capability to leverage knowledge across domains and learn general purpose representations. Nevertheless, they can fail to live up to these promises as tasks often compete for a model's limited resources, potentially leading to lower overall performance. In this work we tackle the issue of interfering tasks through a comprehensive analysis of their training, derived from looking at the interaction between gradients within their shared parameters. Our empirical results show that well-performing models have low variance in the angles between task gradients and that popular regularization methods implicitly reduce this measure. Based on this observation, we propose a novel gradient regularization term that minimizes task interference by enforcing near orthogonal gradients. Updating the shared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
