GDOD: Effective Gradient Descent using Orthogonal Decomposition for   Multi-Task Learning

Xin Dong; Ruize Wu; Chao Xiong; Hai Li; Lei Cheng; Yong He; Shiyou; Qian; Jian Cao; Linjian Mo

arXiv:2301.13465·cs.LG·February 1, 2023

GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning

Xin Dong, Ruize Wu, Chao Xiong, Hai Li, Lei Cheng, Yong He, Shiyou, Qian, Jian Cao, Linjian Mo

PDF

TL;DR

GDOD introduces an orthogonal decomposition-based optimization method for multi-task learning, effectively managing gradient conflicts and improving performance across multiple datasets.

Contribution

The paper proposes GDOD, a novel gradient manipulation technique using orthogonal basis decomposition to enhance multi-task learning optimization.

Findings

01

GDOD significantly outperforms existing MTL models in AUC and Logloss.

02

GDOD effectively decomposes gradients into shared and conflicting components.

03

Theoretical convergence of GDOD is proven under convex and non-convex conditions.

Abstract

Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient updates for all tasks carefully. To this end, we propose a novel optimization approach for MTL, named GDOD, which manipulates gradients of each task using an orthogonal basis decomposed from the span of all task gradients. GDOD decomposes gradients into task-shared and task-conflict components explicitly and adopts a general update rule for avoiding interference across all task gradients. This allows guiding the update directions depending on the task-shared components. Moreover, we prove the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.