On the Language-specificity of Multilingual BERT and the Impact of   Fine-tuning

Marc Tanti; Lonneke van der Plas; Claudia Borg; Albert Gatt

arXiv:2109.06935·cs.CL·December 28, 2021

On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning

Marc Tanti, Lonneke van der Plas, Claudia Borg, Albert Gatt

PDF

Open Access 1 Repo

TL;DR

This paper investigates how fine-tuning affects multilingual BERT's language-specific and language-neutral knowledge, showing that fine-tuning reorganizes representations to favor language-independent features at the cost of language-specific ones.

Contribution

It provides a detailed analysis of the impact of fine-tuning on mBERT's language representations and explores methods to unlearn language-specific features.

Findings

01

Fine-tuning reduces mBERT's ability to cluster by language.

02

Language identification accuracy drops after fine-tuning.

03

Unlearning methods do not significantly improve language-independent representations.

Abstract

Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two components: a language-specific and a language-neutral one. This paper analyses the relationship between them, in the context of fine-tuning on two tasks -- POS tagging and natural language inference -- which require the model to bring to bear different degrees of language-specific knowledge. Visualisations reveal that mBERT loses the ability to cluster representations by language after fine-tuning, a result that is supported by evidence from language identification experiments. However, further experiments on 'unlearning' language-specific representations using gradient reversal and iterative adversarial learning are shown not to add further improvement to the language-independent component over and above the effect of fine-tuning. The results presented here suggest that the process of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mtanti/mbert-language-specificity
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · mBERT · Layer Normalization · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Multi-Head Attention