What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza, Federico Bianchi, Dirk Hovy

TL;DR
This paper surveys language-specific BERT models, comparing them to multilingual BERT, to evaluate their performance across different languages, domains, and tasks, and provides an interactive platform for exploration.
Contribution
It offers a comprehensive overview of language-specific BERT models, highlighting their differences from mBERT and assessing their effectiveness in various NLP scenarios.
Findings
Language-specific BERT models often outperform mBERT on their respective languages.
Differences in architecture and training data significantly impact model performance.
An interactive website is provided for ongoing comparison and analysis.
Abstract
Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained contextual representation models. In particular, Devlin et al. (2019) proposed a model, called BERT (Bidirectional Encoder Representations from Transformers), which enables researchers to obtain state-of-the art performance on numerous NLP tasks by fine-tuning the representations on their data set and task, without the need for developing and training highly-specific architectures. The authors also released multilingual BERT (mBERT), a model trained on a corpus of 104 languages, which can serve as a universal language model. This model obtained impressive results on a zero-shot cross-lingual natural inference task. Driven by the potential of BERT models, the NLP community has started to investigate and generate an abundant number of BERT models that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsLinear Layer · mBERT · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece
