How Much Data is Enough Data? Fine-Tuning Large Language Models for   In-House Translation: Performance Evaluation Across Multiple Dataset Sizes

Inacio Vieira; Will Allred; S\'eamus Lankford; Sheila Castilho; Andy; Way

arXiv:2409.03454·cs.CL·September 11, 2024·2 cites

How Much Data is Enough Data? Fine-Tuning Large Language Models for In-House Translation: Performance Evaluation Across Multiple Dataset Sizes

Inacio Vieira, Will Allred, S\'eamus Lankford, Sheila Castilho, Andy, Way

PDF

Open Access

TL;DR

This study evaluates how varying dataset sizes impact the performance of fine-tuned Llama 3 8B models for organisation-specific translation tasks, demonstrating that larger datasets improve translation quality across multiple languages.

Contribution

It provides empirical evidence on the relationship between dataset size and translation quality when fine-tuning LLMs with translation memories for domain-specific translation.

Findings

01

Larger datasets lead to significant improvements in BLEU and COMET scores.

02

Fine-tuning with only 1k or 2k examples can decrease performance.

03

Integrating TMs with LLMs can create effective, domain-specific translation models.

Abstract

Decoder-only LLMs have shown impressive performance in MT due to their ability to learn from extensive datasets and generate high-quality translations. However, LLMs often struggle with the nuances and style required for organisation-specific translation. In this study, we explore the effectiveness of fine-tuning Large Language Models (LLMs), particularly Llama 3 8B Instruct, leveraging translation memories (TMs), as a valuable resource to enhance accuracy and efficiency. We investigate the impact of fine-tuning the Llama 3 model using TMs from a specific organisation in the software sector. Our experiments cover five translation directions across languages of varying resource levels (English to Brazilian Portuguese, Czech, German, Finnish, and Korean). We analyse diverse sizes of training datasets (1k to 207k segments) to evaluate their influence on translation quality. We fine-tune…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training · LLaMA