How Multilingual Are Large Language Models Fine-Tuned for Translation?
Aquia Richburg, Marine Carpuat

TL;DR
This paper evaluates how fine-tuning large language models for translation affects their ability to handle multiple languages, especially zero-shot and non-English tasks, revealing uneven improvements across language pairs.
Contribution
It provides an extensive empirical analysis of the translation capabilities of fine-tuned LLMs across 132 tasks, highlighting the impact on zero-shot and multilingual translation.
Findings
Fine-tuning improves zero-shot translation quality on average.
Impact varies significantly depending on language pairs.
Further research is needed to enable effective massively multilingual translation.
Abstract
A new paradigm for machine translation has recently emerged: fine-tuning large language models (LLM) on parallel text has been shown to outperform dedicated translation systems trained in a supervised fashion on much larger amounts of parallel data (Xu et al., 2024a; Alves et al., 2024). However, it remains unclear whether this paradigm can enable massively multilingual machine translation or whether it requires fine-tuning dedicated models for a small number of language pairs. How does translation fine-tuning impact the MT capabilities of LLMs for zero-shot languages, zero-shot language pairs, and translation tasks that do not involve English? To address these questions, we conduct an extensive empirical evaluation of the translation quality of the TOWER family of language models (Alves et al., 2024) on 132 translation tasks from the multi-parallel FLORES-200 data. We find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
