TL;DR
This paper introduces data-efficient methods to adapt multilingual language models to under-resourced languages and unseen scripts, leveraging matrix factorization and shared vocabulary to improve cross-lingual performance.
Contribution
The authors propose novel adaptation techniques using matrix factorization and shared tokens to enhance multilingual models for unseen scripts and low-resource languages.
Findings
Significant performance improvements for languages with unseen scripts.
Enhanced low-resource language performance with minimal additional data.
Effective adaptation using shared vocabulary tokens.
Abstract
Massively multilingual language models such as multilingual BERT offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. However, due to limited capacity and large differences in pretraining data sizes, there is a profound performance gap between resource-rich and resource-poor target languages. The ultimate challenge is dealing with under-resourced languages not covered at all by the models and written in scripts unseen during pretraining. In this work, we propose a series of novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts. Relying on matrix factorization, our methods capitalize on the existing latent knowledge about multiple languages already available in the pretrained model's embedding matrix. Furthermore, we show that learning of the new dedicated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dropout · Softmax · Linear Warmup With Linear Decay · Dense Connections · Attention Dropout · Attention Is All You Need · Layer Normalization · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia?
