Extracting General-use Transformers for Low-resource Languages via   Knowledge Distillation

Jan Christian Blaise Cruz; Alham Fikri Aji

arXiv:2501.12660·cs.CL·January 23, 2025

Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation

Jan Christian Blaise Cruz, Alham Fikri Aji

PDF

Open Access

TL;DR

This paper introduces a simple knowledge distillation method to create smaller, efficient single-language transformers from multilingual models, improving performance in low-resource languages like Tagalog.

Contribution

It presents a novel distillation approach that enhances low-resource language models by leveraging multilingual transformers, with detailed analyses and ablations demonstrating its effectiveness.

Findings

01

Smaller models match strong baselines in benchmark tasks

02

Distillation improves soft-supervision for target languages

03

Method enhances efficiency for low-resource language processing

Abstract

In this paper, we propose the use of simple knowledge distillation to produce smaller and more efficient single-language transformers from Massively Multilingual Transformers (MMTs) to alleviate tradeoffs associated with the use of such in low-resource settings. Using Tagalog as a case study, we show that these smaller single-language models perform on-par with strong baselines in a variety of benchmark tasks in a much more efficient manner. Furthermore, we investigate additional steps during the distillation process that improves the soft-supervision of the target language, and provide a number of analyses and ablations to show the efficacy of the proposed method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsKnowledge Distillation