ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual   Pre-training

Henry Tang; Ameet Deshpande; Karthik Narasimhan

arXiv:2211.08547·cs.CL·November 17, 2022·1 cites

ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training

Henry Tang, Ameet Deshpande, Karthik Narasimhan

PDF

Open Access 1 Repo

TL;DR

ALIGN-MLM introduces a novel pre-training objective that explicitly aligns word embeddings across languages, significantly improving zero-shot transfer performance especially between languages with different scripts and structures.

Contribution

The paper proposes ALIGN-MLM, a new pre-training method that emphasizes word embedding alignment, demonstrating its effectiveness over existing objectives in multilingual transfer tasks.

Findings

01

ALIGN-MLM outperforms XLM and MLM by 35 and 30 F1 points on POS-tagging.

02

Strong correlation between embedding alignment and transfer success (rho=0.727).

03

Explicitly aligning word embeddings enhances multilingual model transferability.

Abstract

Multilingual pre-trained models exhibit zero-shot cross-lingual transfer, where a model fine-tuned on a source language achieves surprisingly good performance on a target language. While studies have attempted to understand transfer, they focus only on MLM, and the large number of differences between natural languages makes it hard to disentangle the importance of different properties. In this work, we specifically highlight the importance of word embedding alignment by proposing a pre-training objective (ALIGN-MLM) whose auxiliary loss guides similar words in different languages to have similar word embeddings. ALIGN-MLM either outperforms or matches three widely adopted objectives (MLM, XLM, DICT-MLM) when we evaluate transfer between pairs of natural languages and their counterparts created by systematically modifying specific properties like the script. In particular, ALIGN-MLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princeton-nlp/align-mlm
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Byte Pair Encoding · Dropout · Attention Dropout · Dense Connections · Layer Normalization · Residual Connection