On the Cross-lingual Transferability of Monolingual Representations

Mikel Artetxe; Sebastian Ruder; Dani Yogatama

arXiv:1910.11856·cs.CL·December 28, 2021

On the Cross-lingual Transferability of Monolingual Representations

Mikel Artetxe, Sebastian Ruder, Dani Yogatama

PDF

5 Repos 10 Models 5 Datasets

TL;DR

This paper investigates how monolingual transformer models can transfer to new languages without shared vocabularies, showing competitive performance with multilingual models and challenging existing beliefs about cross-lingual generalization.

Contribution

It introduces a simple transfer method for monolingual models to new languages by learning new embeddings, demonstrating competitive results without shared vocabularies or joint training.

Findings

01

Monolingual transfer approach performs well on cross-lingual tasks.

02

Deep monolingual models learn abstractions that generalize across languages.

03

The proposed method challenges the belief that shared vocabularies are essential for cross-lingual transfer.

Abstract

State-of-the-art unsupervised multilingual models (e.g., multilingual BERT) have been shown to generalize in a zero-shot cross-lingual setting. This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions. We evaluate this hypothesis by designing an alternative approach that transfers a monolingual model to new languages at the lexical level. More concretely, we first train a transformer-based masked language model on one language, and transfer it to a new language by learning a new embedding matrix with the same masked language modeling objective, freezing parameters of all other layers. This approach does not rely on a shared vocabulary or joint training. However, we show that it is competitive with multilingual BERT on standard cross-lingual classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax