How Do Multilingual Encoders Learn Cross-lingual Representation?

Shijie Wu

arXiv:2207.05737·cs.CL·July 13, 2022·1 cites

How Do Multilingual Encoders Learn Cross-lingual Representation?

Shijie Wu

PDF

Open Access

TL;DR

This paper investigates how multilingual encoders like mBERT learn cross-lingual representations without explicit signals, analyzing their behavior across languages and proposing improvements for better transfer.

Contribution

It provides a comprehensive analysis of the mechanisms behind cross-lingual learning in multilingual encoders and explores methods to enhance their transfer capabilities.

Findings

01

Multilingual BERT learns cross-lingual representations without explicit signals.

02

Cross-lingual transfer effectiveness varies across high and low resource languages.

03

Injecting signals can improve cross-lingual transfer performance.

Abstract

NLP systems typically require support for more than one language. As different languages have different amounts of supervision, cross-lingual transfer benefits languages with little to no training data by transferring from other languages. From an engineering perspective, multilingual NLP benefits development and maintenance by serving multiple languages with a single system. Both cross-lingual transfer and multilingual NLP rely on cross-lingual representations serving as the foundation. As BERT revolutionized representation learning and NLP, it also revolutionized cross-lingual representations and cross-lingual transfer. Multilingual BERT was released as a replacement for single-language BERT, trained with Wikipedia data in 104 languages. Surprisingly, without any explicit cross-lingual signal, multilingual BERT learns cross-lingual representations in addition to representations for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Adam · Residual Connection · Layer Normalization · Linear Warmup With Linear Decay · Weight Decay