How Do Multilingual Encoders Learn Cross-lingual Representation?
Shijie Wu

TL;DR
This paper investigates how multilingual encoders like mBERT learn cross-lingual representations without explicit signals, analyzing their behavior across languages and proposing improvements for better transfer.
Contribution
It provides a comprehensive analysis of the mechanisms behind cross-lingual learning in multilingual encoders and explores methods to enhance their transfer capabilities.
Findings
Multilingual BERT learns cross-lingual representations without explicit signals.
Cross-lingual transfer effectiveness varies across high and low resource languages.
Injecting signals can improve cross-lingual transfer performance.
Abstract
NLP systems typically require support for more than one language. As different languages have different amounts of supervision, cross-lingual transfer benefits languages with little to no training data by transferring from other languages. From an engineering perspective, multilingual NLP benefits development and maintenance by serving multiple languages with a single system. Both cross-lingual transfer and multilingual NLP rely on cross-lingual representations serving as the foundation. As BERT revolutionized representation learning and NLP, it also revolutionized cross-lingual representations and cross-lingual transfer. Multilingual BERT was released as a replacement for single-language BERT, trained with Wikipedia data in 104 languages. Surprisingly, without any explicit cross-lingual signal, multilingual BERT learns cross-lingual representations in addition to representations for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Adam · Residual Connection · Layer Normalization · Linear Warmup With Linear Decay · Weight Decay
