You Can Have Your Data and Balance It Too: Towards Balanced and   Efficient Multilingual Models

Tomasz Limisiewicz; Dan Malkin; Gabriel Stanovsky

arXiv:2210.07135·cs.CL·May 29, 2023·1 cites

You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

Tomasz Limisiewicz, Dan Malkin, Gabriel Stanovsky

PDF

Open Access

TL;DR

This paper introduces a teacher-student knowledge distillation method for multilingual models that improves performance on low-resource languages while maintaining high-resource language performance, promoting more balanced NLP systems.

Contribution

The paper proposes a novel multilingual training technique using monolingual teacher models and balanced data to enhance low-resource language performance.

Findings

01

Outperforms standard training in low-resource languages

02

Maintains high-resource language performance

03

Uses the same data amount as standard methods

Abstract

Multilingual models have been widely used for cross-lingual transfer to low-resource languages. However, the performance on these languages is hindered by their underrepresentation in the pretraining data. To alleviate this problem, we propose a novel multilingual training technique based on teacher-student knowledge distillation. In this setting, we utilize monolingual teacher models optimized for their language. We use those teachers along with balanced (sub-sampled) data to distill the teachers' knowledge into a single multilingual student. Our method outperforms standard training methods in low-resource languages and retrains performance on high-resource languages while using the same amount of data. If applied widely, our approach can increase the representation of low-resource languages in NLP systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications