Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller, Mark Dredze, Nicholas Andrews

TL;DR
This paper investigates how multi-task learning influences optimization trajectories and generalization, revealing that factors like gradient conflict relate to optimization issues but do not fully explain generalization gaps.
Contribution
The study empirically analyzes the impact of multi-task learning on optimization trajectories and challenges existing explanations for generalization gaps in MTL.
Findings
MTL causes a generalization gap early in training.
Gradient conflict correlates with optimization difficulties.
Existing trajectory-based explanations do not fully account for generalization gaps.
Abstract
Despite the widespread adoption of multi-task training in deep learning, little is understood about how multi-task learning (MTL) affects generalization. Prior work has conjectured that the negative effects of MTL are due to optimization challenges that arise during training, and many optimization methods have been proposed to improve multi-task performance. However, recent work has shown that these methods fail to consistently improve multi-task generalization. In this work, we seek to improve our understanding of these failures by empirically studying how MTL impacts the optimization of tasks, and whether this impact can explain the effects of MTL on generalization. We show that MTL results in a generalization gap (a gap in generalization at comparable training loss) between single-task and multi-task trajectories early into training. However, we find that factors of the optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Queuing Theory Analysis · Simulation Techniques and Applications
