Specializing Multilingual Language Models: An Empirical Study

Ethan C. Chau; Noah A. Smith

arXiv:2106.09063·cs.CL·June 22, 2022

Specializing Multilingual Language Models: An Empirical Study

Ethan C. Chau, Noah A. Smith

PDF

Open Access 1 Repo

TL;DR

This paper empirically investigates how vocabulary augmentation and script transliteration affect the performance and adaptability of multilingual language models across various low-resource languages and NLP tasks.

Contribution

It provides a comprehensive evaluation of two adaptation techniques, highlighting their effectiveness and raising questions for optimal low-resource language model adaptation.

Findings

01

Vocabulary augmentation improves low-resource language performance.

02

Script transliteration enhances model adaptability.

03

Both methods are viable but require further optimization.

Abstract

Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations. In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary augmentation and script transliteration. Our evaluations on part-of-speech tagging, universal dependency parsing, and named entity recognition in nine diverse low-resource languages uphold the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low-resource settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ethch18/specializing-multilingual
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance