Evaluating Multilingual BERT for Estonian
Claudia Kittask, Kirill Milintsevich, Kairit Sirts

TL;DR
This study evaluates four multilingual BERT models on Estonian NLP tasks, demonstrating their strong performance and potential as effective tools for Estonian language processing.
Contribution
It provides a comprehensive comparison of multilingual BERT models for Estonian NLP tasks, highlighting their effectiveness over existing baselines.
Findings
Multilingual BERT models outperform baseline models in POS and morphological tagging.
XLM-RoBERTa achieves the highest results among the evaluated models.
Multilingual models generalize well across different Estonian NLP tasks.
Abstract
Recently, large pre-trained language models, such as BERT, have reached state-of-the-art performance in many natural language processing tasks, but for many languages, including Estonian, BERT models are not yet available. However, there exist several multilingual BERT models that can handle multiple languages simultaneously and that have been trained also on Estonian data. In this paper, we evaluate four multilingual models -- multilingual BERT, multilingual distilled BERT, XLM and XLM-RoBERTa -- on several NLP tasks including POS and morphological tagging, NER and text classification. Our aim is to establish a comparison between these multilingual BERT models and the existing baseline neural models for these tasks. Our results show that multilingual BERT models can generalise well on different Estonian NLP tasks outperforming all baselines models for POS and morphological tagging and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Byte Pair Encoding · Adam · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · XLM · Dropout · Linear Warmup With Linear Decay · Layer Normalization
