Adapting Auxiliary Losses Using Gradient Similarity
Yunshu Du, Wojciech M. Czarnecki, Siddhant M. Jayakumar, Mehrdad, Farajtabar, Razvan Pascanu, Balaji Lakshminarayanan

TL;DR
This paper introduces a method that uses gradient cosine similarity to adaptively weight auxiliary losses, improving neural network training by identifying when auxiliary tasks are beneficial.
Contribution
The paper proposes a gradient similarity-based adaptive weighting scheme for auxiliary losses, ensuring convergence and practical effectiveness across multiple domains.
Findings
Improves multi-task learning on ImageNet subsets
Enhances reinforcement learning performance in gridworld and Atari
Guarantees convergence to critical points of the main task
Abstract
One approach to deal with the statistical inefficiency of neural networks is to rely on auxiliary losses that help to build useful representations. However, it is not always trivial to know if an auxiliary task will be helpful for the main task and when it could start hurting. We propose to use the cosine similarity between gradients of tasks as an adaptive weight to detect when an auxiliary loss is helpful to the main loss. We show that our approach is guaranteed to converge to critical points of the main task and demonstrate the practical usefulness of the proposed algorithm in a few domains: multi-task supervised learning on subsets of ImageNet, reinforcement learning on gridworld, and reinforcement learning on Atari games.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
