FinEst BERT and CroSloEngual BERT: less is more in multilingual models
Matej Ul\v{c}ar, Marko Robnik-\v{S}ikonja

TL;DR
This paper introduces two smaller, language-specific BERT models for Finnish, Estonian, Croatian, and Slovenian, demonstrating improved performance over larger multilingual models on various NLP tasks.
Contribution
The paper presents two novel trilingual BERT-like models that outperform existing multilingual models on multiple NLP benchmarks, emphasizing the effectiveness of less complex, language-focused models.
Findings
FinEst BERT and CroSloEngual BERT outperform multilingual BERT and XLM-R on key NLP tasks.
Models show improved results in monolingual and cross-lingual settings.
Smaller, language-specific models can be more effective than larger multilingual models.
Abstract
Large pretrained masked language models have become state-of-the-art solutions for many NLP problems. The research has been mostly focused on English language, though. While massively multilingual models exist, studies have shown that monolingual models produce much better results. We train two trilingual BERT-like models, one for Finnish, Estonian, and English, the other for Croatian, Slovenian, and English. We evaluate their performance on several downstream tasks, NER, POS-tagging, and dependency parsing, using the multilingual BERT and XLM-R as baselines. The newly created FinEst BERT and CroSloEngual BERT improve the results on all tasks in most monolingual and cross-lingual situations
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsXLM-R · Linear Layer · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay
