How Language-Neutral is Multilingual BERT?
Jind\v{r}ich Libovick\'y, Rudolf Rosa, Alexander Fraser

TL;DR
This paper investigates the semantic capabilities of multilingual BERT, revealing it can produce language-neutral representations useful for certain tasks but still faces challenges in tasks demanding deep semantic transfer.
Contribution
It demonstrates that mBERT's representations can be decomposed into language-specific and language-neutral parts, highlighting the potential and limitations of its semantic transfer abilities.
Findings
mBERT's representations can be split into language-specific and language-neutral components
The language-neutral component enables high-accuracy word-alignment and sentence retrieval
mBERT's semantic representations are not yet sufficient for machine translation quality estimation
Abstract
Multilingual BERT (mBERT) provides sentence representations for 104 languages, which are useful for many multi-lingual tasks. Previous work probed the cross-linguality of mBERT using zero-shot transfer learning on morphological and syntactic tasks. We instead focus on the semantic properties of mBERT. We show that mBERT representations can be split into a language-specific component and a language-neutral component, and that the language-neutral component is sufficiently general in terms of modeling semantics to allow high-accuracy word-alignment and sentence retrieval but is not yet good enough for the more difficult task of MT quality estimation. Our work presents interesting challenges which must be solved to build better language-neutral representations, particularly for tasks requiring linguistic transfer of semantics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · mBERT · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece
