Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+   Languages

Fabian David Schmidt; Philipp Borchert; Ivan Vuli\'c; Goran Glava\v{s}

arXiv:2406.12739·cs.CL·June 19, 2024

Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages

Fabian David Schmidt, Philipp Borchert, Ivan Vuli\'c, Goran Glava\v{s}

PDF

Open Access 1 Models 1 Video

TL;DR

This paper introduces MT-LLMs, a novel approach combining machine translation encoders with large language models through self-distillation, significantly enhancing cross-lingual natural language understanding in over 200 languages.

Contribution

The work presents a new method of integrating MT encoders into LLMs via self-distillation, enabling effective multilingual NLU especially for low-resource languages.

Findings

01

MT-LLMs outperform translate-test methods across multiple NLU tasks

02

The approach improves NLU performance in over 127 low-resource languages

03

MT-LLMs maintain multilingual alignment and reduce translation errors

Abstract

LLMs have become a go-to solution not just for text generation, but also for natural language understanding (NLU) tasks. Acquiring extensive knowledge through language modeling on web-scale corpora, they excel on English NLU, yet struggle to extend their NLU capabilities to underrepresented languages. In contrast, machine translation models (MT) produce excellent multilingual representations, resulting in strong translation performance even for low-resource languages. MT encoders, however, lack the knowledge necessary for comprehensive NLU that LLMs obtain through language modeling training on immense corpora. In this work, we get the best both worlds by integrating MT encoders directly into LLM backbones via sample-efficient self-distillation. The resulting MT-LLMs preserve the inherent multilingual representational alignment from the MT encoder, allowing lower-resource languages to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
fdschmidt93/NLLB-LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse
model· 22 dl· ♡ 6
22 dl♡ 6

Videos

Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages· underline

Taxonomy

TopicsNatural Language Processing Techniques