Sequential Reptile: Inter-Task Gradient Alignment for Multilingual   Learning

Seanie Lee; Hae Beom Lee; Juho Lee; Sung Ju Hwang

arXiv:2110.02600·cs.CL·March 1, 2022·1 cites

Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning

Seanie Lee, Hae Beom Lee, Juho Lee, Sung Ju Hwang

PDF

Open Access 1 Video

TL;DR

This paper introduces Sequential Reptile, a method that aligns gradients across tasks in multilingual learning to improve knowledge transfer and reduce negative transfer, outperforming existing approaches.

Contribution

The paper proposes a simple, efficient gradient alignment technique using sequential sampling and Reptile updates to enhance multilingual and multi-task learning performance.

Findings

01

Significantly reduces negative transfer and catastrophic forgetting.

02

Outperforms relevant baselines on multi-task and zero-shot transfer tasks.

03

Improves inter-task gradient alignment for better knowledge sharing.

Abstract

Multilingual models jointly pretrained on multiple languages have achieved remarkable performance on various multilingual downstream tasks. Moreover, models finetuned on a single monolingual downstream task have shown to generalize to unseen languages. In this paper, we first show that it is crucial for those tasks to align gradients between them in order to maximize knowledge transfer while minimizing negative transfer. Despite its importance, the existing methods for gradient alignment either have a completely different purpose, ignore inter-task alignment, or aim to solve continual learning problems in rather inefficient ways. As a result of the misaligned gradients between tasks, the model suffers from severe negative transfer in the form of catastrophic forgetting of the knowledge acquired from the pretraining. To overcome the limitations, we propose a simple yet effective method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling