Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning
Seanie Lee, Hae Beom Lee, Juho Lee, Sung Ju Hwang

TL;DR
This paper introduces Sequential Reptile, a method that aligns gradients across tasks in multilingual learning to improve knowledge transfer and reduce negative transfer, outperforming existing approaches.
Contribution
The paper proposes a simple, efficient gradient alignment technique using sequential sampling and Reptile updates to enhance multilingual and multi-task learning performance.
Findings
Significantly reduces negative transfer and catastrophic forgetting.
Outperforms relevant baselines on multi-task and zero-shot transfer tasks.
Improves inter-task gradient alignment for better knowledge sharing.
Abstract
Multilingual models jointly pretrained on multiple languages have achieved remarkable performance on various multilingual downstream tasks. Moreover, models finetuned on a single monolingual downstream task have shown to generalize to unseen languages. In this paper, we first show that it is crucial for those tasks to align gradients between them in order to maximize knowledge transfer while minimizing negative transfer. Despite its importance, the existing methods for gradient alignment either have a completely different purpose, ignore inter-task alignment, or aim to solve continual learning problems in rather inefficient ways. As a result of the misaligned gradients between tasks, the model suffers from severe negative transfer in the form of catastrophic forgetting of the knowledge acquired from the pretraining. To overcome the limitations, we propose a simple yet effective method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling
