The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation

David Stap; Christof Monz

arXiv:2505.13090·cs.CL·September 22, 2025

The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation

David Stap, Christof Monz

PDF

Open Access 1 Video

TL;DR

This paper systematically investigates how language diversity during fine-tuning large language models affects translation quality, revealing benefits up to a certain diversity threshold and explaining improvements through more language-agnostic representations.

Contribution

It provides a comprehensive analysis of language diversity effects in fine-tuning LLMs for translation, resolving conflicting prior findings and identifying optimal diversity levels.

Findings

01

Increased language diversity improves translation quality up to a threshold.

02

Diversity benefits are observed in both supervised and unsupervised translation.

03

More diverse models develop more language-agnostic representations.

Abstract

Prior research diverges on language diversity in LLM fine-tuning: Some studies report benefits while others find no advantages. Through controlled fine-tuning experiments across 132 translation directions, we systematically resolve these disparities. We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and -- surprisingly -- supervised pairs, despite less diverse models being fine-tuned exclusively on these supervised pairs. However, benefits plateau or decrease beyond a certain diversity threshold. We show that increased language diversity creates more language-agnostic representations. These representational adaptations help explain the improved performance in models fine-tuned with greater diversity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification