It's not Greek to mBERT: Inducing Word-Level Translations from   Multilingual BERT

Hila Gonen; Shauli Ravfogel; Yanai Elazar; Yoav Goldberg

arXiv:2010.08275·cs.CL·October 19, 2020

It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg

PDF

1 Repo

TL;DR

This paper investigates how multilingual BERT encodes word-level translation information, revealing that it contains both language-specific and cross-lingual components, which can be extracted with simple methods without fine-tuning.

Contribution

The authors introduce two straightforward methods to extract translation capabilities from mBERT, and identify an empirical language-identity subspace within its representations.

Findings

01

Most translation information is non-linearly encoded in mBERT.

02

Some translation information can be recovered with linear tools.

03

An empirical language-identity subspace exists within mBERT representations.

Abstract

Recent works have demonstrated that multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages. We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning. The results suggest that most of this information is encoded in a non-linear way, while some of it can also be recovered with purely linear tools. As part of our analysis, we test the hypothesis that mBERT learns representations which contain both a language-encoding component and an abstract, cross-lingual component, and explicitly identify an empirical language-identity subspace within mBERT representations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gonenhila/mbert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · mBERT · WordPiece · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?