Gradient Vaccine: Investigating and Improving Multi-task Optimization in   Massively Multilingual Models

Zirui Wang; Yulia Tsvetkov; Orhan Firat; Yuan Cao

arXiv:2010.05874·cs.CL·October 13, 2020·60 cites

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Gradient Vaccine, a scalable method that improves multi-task optimization in multilingual models by leveraging gradient similarity to better align updates for related languages, leading to performance gains.

Contribution

It proposes a novel optimization procedure that uses gradient similarity to enhance training efficiency in massively multilingual models, addressing limitations of existing methods.

Findings

01

Significant performance improvements on multilingual translation tasks.

02

Gradient similarity correlates with language proximity and model performance.

03

The method is scalable and broadly applicable to multi-task learning.

Abstract

Massively multilingual models subsuming tens or even hundreds of languages pose great challenges to multi-task optimization. While it is a common practice to apply a language-agnostic procedure optimizing a joint multilingual task objective, how to properly characterize and take advantage of its underlying problem structure for improving optimization efficiency remains under-explored. In this paper, we attempt to peek into the black-box of multilingual optimization through the lens of loss function geometry. We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with not only language proximity but also the overall model performance. Such observation helps us to identify a critical limitation of existing gradient-based multi-task learning methods, and thus we derive a simple and scalable optimization procedure, named…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenllliang/Gradient-Vaccine
pytorch

Videos

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning