Feed m Birds with One Scone: Accelerating Multi-task Gradient Balancing via Bi-level Optimization
Xuxing Chen, Yun He, Jiayi Xu, Minhui Huang, Xiaoyi Liu, Boyang Liu, Fei Tian, Xiaohan Wei, Rong Jin, Sem Park, Bo Long, Xue Feng

TL;DR
This paper introduces MARIGOLD, a bi-level optimization framework that efficiently balances gradients in multi-task learning, improving performance and reducing computational costs compared to existing methods.
Contribution
The paper proposes a novel bi-level optimization approach for multi-task gradient balancing, addressing efficiency issues of prior gradient-based methods like MGDA.
Findings
MARIGOLD outperforms existing methods on public datasets.
It reduces computational costs in multi-task learning.
Demonstrates effectiveness on industrial-scale datasets.
Abstract
In machine learning, the goal of multi-task learning (MTL) is to optimize multiple objectives together. Recent works, for example, Multiple Gradient Descent Algorithm (MGDA) and its variants, show promising results with dynamically adjusted weights for different tasks to mitigate conflicts that may potentially degrade the performance on certain tasks. Despite the empirical success of MGDA-type methods, one major limitation of such methods is their computational inefficiency, as they require access to all task gradients. In this paper we introduce MARIGOLD, a unified algorithmic framework for efficiently solving MTL problems. Our method reveals that multi-task gradient balancing methods have a hierarchical structure, in which the model training and the gradient balancing are coupled during the whole optimization process and can be viewed as a bi-level optimization problem. Moreover, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
