Multilingual is not enough: BERT for Finnish
Antti Virtanen, Jenna Kanerva, Rami Ilo, Jouni Luoma, Juhani, Luotolahti, Tapio Salakoski, Filip Ginter, Sampo Pyysalo

TL;DR
This paper evaluates multilingual BERT for Finnish and introduces a new Finnish-specific BERT model that outperforms the multilingual version and sets new state-of-the-art results across multiple NLP tasks.
Contribution
The paper presents a Finnish-specific BERT model trained from scratch, demonstrating its superiority over multilingual BERT for Finnish NLP tasks.
Findings
Finnish BERT outperforms multilingual BERT on all tasks
The new Finnish BERT achieves state-of-the-art results
Multilingual models may not be sufficient for lower-resourced languages
Abstract
Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural language processing, with recent approaches such as the transformer-based BERT model advancing the state of the art across a variety of tasks. While most work on these models has focused on high-resource languages, in particular English, a number of recent efforts have introduced multilingual models that can be fine-tuned to address tasks in a large number of different languages. However, we still lack a thorough understanding of the capabilities of these models, in particular for lower-resourced languages. In this paper, we focus on Finnish and thoroughly evaluate the multilingual BERT model on a range of tasks, comparing it with a new Finnish BERT model trained from scratch. The new language-specific model is shown to systematically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
