Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau, Noah A. Smith

TL;DR
This paper empirically investigates how vocabulary augmentation and script transliteration affect the performance and adaptability of multilingual language models across various low-resource languages and NLP tasks.
Contribution
It provides a comprehensive evaluation of two adaptation techniques, highlighting their effectiveness and raising questions for optimal low-resource language model adaptation.
Findings
Vocabulary augmentation improves low-resource language performance.
Script transliteration enhances model adaptability.
Both methods are viable but require further optimization.
Abstract
Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations. In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary augmentation and script transliteration. Our evaluations on part-of-speech tagging, universal dependency parsing, and named entity recognition in nine diverse low-resource languages uphold the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low-resource settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance
