WikiBERT models: deep transfer learning for many languages
Sampo Pyysalo, Jenna Kanerva, Antti Virtanen, Filip Ginter

TL;DR
This paper introduces 42 new language-specific BERT models trained on Wikipedia data, demonstrating that these models often outperform multilingual BERT in parsing tasks, with benefits varying across languages.
Contribution
The paper presents a fully automated pipeline for creating language-specific BERT models and evaluates their performance, providing new resources and insights into mono- versus multilingual training tradeoffs.
Findings
Language-specific WikiBERT models often outperform mBERT in parsing tasks.
Performance improvements vary significantly across different languages.
The study offers preliminary insights into when language-specific models are most advantageous.
Abstract
Deep neural language models such as BERT have enabled substantial recent advances in many natural language processing tasks. Due to the effort and computational cost involved in their pre-training, language-specific models are typically introduced only for a small number of high-resource languages such as English. While multilingual models covering large numbers of languages are available, recent work suggests monolingual training can produce better models, and our understanding of the tradeoffs between mono- and multilingual training is incomplete. In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models. We assess the merits of these models using the state-of-the-art UDify parser on Universal Dependencies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · mBERT · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay
