Adaptive Scheduling for Multi-Task Learning
S\'ebastien Jean, Orhan Firat, Melvin Johnson

TL;DR
This paper investigates adaptive and implicit task scheduling methods for multi-task neural machine translation, aiming to improve low-resource language performance while maintaining overall model quality.
Contribution
It introduces novel adaptive and implicit scheduling techniques that dynamically balance tasks based on performance, enhancing low-resource language translation in multi-task models.
Findings
Adaptive schedules improve low-resource task performance.
Implicit scheduling effectively balances task learning.
Proposed methods outperform uniform sampling baselines.
Abstract
To train neural machine translation models simultaneously on multiple tasks (languages), it is common to sample each task uniformly or in proportion to dataset sizes. As these methods offer little control over performance trade-offs, we explore different task scheduling approaches. We first consider existing non-adaptive techniques, then move on to adaptive schedules that over-sample tasks with poorer results compared to their respective baseline. As explicit schedules can be inefficient, especially if one task is highly over-sampled, we also consider implicit schedules, learning to scale learning rates or gradients of individual tasks instead. These techniques allow training multilingual models that perform better for low-resource language pairs (tasks with small amount of data), while minimizing negative effects on high-resource tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Optimization and Search Problems · Reinforcement Learning in Robotics
