Regularizing Deep Multi-Task Networks using Orthogonal Gradients

Mihai Suteu; Yike Guo

arXiv:1912.06844·cs.LG·December 17, 2019·30 cites

Regularizing Deep Multi-Task Networks using Orthogonal Gradients

Mihai Suteu, Yike Guo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel gradient regularization technique that enforces near orthogonal gradients in multi-task neural networks, reducing task interference and improving performance across various datasets.

Contribution

The paper proposes a new gradient regularization method that encourages orthogonal gradients to mitigate task interference in multi-task learning.

Findings

01

Low gradient angle variance correlates with better performance.

02

Regularization implicitly reduces gradient angle variance.

03

Method achieves competitive results on multiple datasets.

Abstract

Deep neural networks are a promising approach towards multi-task learning because of their capability to leverage knowledge across domains and learn general purpose representations. Nevertheless, they can fail to live up to these promises as tasks often compete for a model's limited resources, potentially leading to lower overall performance. In this work we tackle the issue of interfering tasks through a comprehensive analysis of their training, derived from looking at the interaction between gradients within their shared parameters. Our empirical results show that well-performing models have low variance in the angles between task gradients and that popular regularization methods implicitly reduce this measure. Based on this observation, we propose a novel gradient regularization term that minimizes task interference by enforcing near orthogonal gradients. Updating the shared…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shaohua0116/MultiDigitMNIST
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques