Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice?
Dawei Zhu, Pinzhen Chen, Miaoran Zhang, Barry Haddow, Xiaoyu Shen,, Dietrich Klakow

TL;DR
This paper investigates how fine-tuning large language models with minimal and noisy data affects multilingual translation, revealing that small datasets can be effective but require careful handling of language directions and data quality.
Contribution
It demonstrates that LLMs can achieve strong translation performance with very limited data and highlights the importance of data direction and quality in fine-tuning for multilingual translation.
Findings
LLMs perform well with as few as 32 parallel sentences.
Fine-tuning on one language direction can enable multi-directional translation.
Noisy synthetic data impacts translation quality differently depending on language representation.
Abstract
Traditionally, success in multilingual machine translation can be attributed to three key factors in training data: large volume, diverse translation directions, and high quality. In the current practice of fine-tuning large language models (LLMs) for translation, we revisit the importance of these factors. We find that LLMs display strong translation capability after being fine-tuned on as few as 32 parallel sentences and that fine-tuning on a single translation direction enables translation in multiple directions. However, the choice of direction is critical: fine-tuning LLMs with only English on the target side can lead to task misinterpretation, which hinders translation into non-English languages. Problems also arise when noisy synthetic data is placed on the target side, especially when the target language is well-represented in LLM pre-training. Yet interestingly, synthesized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
