Evaluating Transferability of BERT Models on Uralic Languages

Judit \'Acs; D\'aniel L\'evai; Andr\'as Kornai

arXiv:2109.06327·cs.CL·November 24, 2021

Evaluating Transferability of BERT Models on Uralic Languages

Judit \'Acs, D\'aniel L\'evai, Andr\'as Kornai

PDF

Open Access 1 Repo

TL;DR

This study assesses the transferability of BERT models across Uralic languages, revealing that multilingual models often outperform monolingual ones, and that high-resource models can effectively be adapted for minority languages.

Contribution

It provides the first comprehensive evaluation of BERT models on Uralic languages, highlighting transferability patterns and practical benefits for minority language NLP tasks.

Findings

01

Multilingual BERT models outperform monolingual models on Uralic languages.

02

High-resource models can be effectively transferred to minority languages without extensive tuning.

03

State-of-the-art POS and NER tools are achievable for minority Uralic languages with minimal effort.

Abstract

Transformer-based language models such as BERT have outperformed previous models on a large number of English benchmarks, but their evaluation is often limited to English or a small number of well-resourced languages. In this work, we evaluate monolingual, multilingual, and randomly initialized language models from the BERT family on a variety of Uralic languages including Estonian, Finnish, Hungarian, Erzya, Moksha, Karelian, Livvi, Komi Permyak, Komi Zyrian, Northern S\'ami, and Skolt S\'ami. When monolingual models are available (currently only et, fi, hu), these perform better on their native language, but in general they transfer worse than multilingual models or models of genetically unrelated languages that share the same character set. Remarkably, straightforward transfer of high-resource models, even without special efforts toward hyperparameter optimization, yields what appear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

juditacs/uralic_eval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Dropout · Layer Normalization · Softmax · Residual Connection