A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla

TL;DR
This paper introduces a novel two-stage fine-tuning method for moderate-sized LLMs that significantly improves translation quality, surpassing larger models and traditional methods without relying on extensive parallel data.
Contribution
The study presents ALMA, a new fine-tuning approach that enhances translation performance of 7B and 13B LLMs using monolingual and minimal high-quality parallel data, establishing a new training paradigm.
Findings
Achieved over 12 BLEU and 12 COMET improvements across 10 translation directions.
Outperformed larger models like NLLB-54B and GPT-3.5 in translation tasks.
Demonstrated effectiveness of the two-stage fine-tuning strategy for machine translation.
Abstract
Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗haoranxu/ALMA-7Bmodel· 954 dl· ♡ 26954 dl♡ 26
- 🤗haoranxu/ALMA-7B-Pretrain-LoRAmodel· ♡ 3♡ 3
- 🤗haoranxu/ALMA-7B-Pretrainmodel· 1.3k dl· ♡ 41.3k dl♡ 4
- 🤗haoranxu/ALMA-13Bmodel· 1.2k dl· ♡ 371.2k dl♡ 37
- 🤗haoranxu/ALMA-13B-Pretrainmodel· 7.9k dl· ♡ 107.9k dl♡ 10
- 🤗haoranxu/ALMA-13B-Pretrain-LoRAmodel· ♡ 7♡ 7
- 🤗TheBloke/ALMA-7B-Pretrain-GPTQmodel· 13 dl· ♡ 113 dl♡ 1
- 🤗TheBloke/ALMA-7B-Pretrain-AWQmodel· 6 dl· ♡ 26 dl♡ 2
- 🤗TheBloke/ALMA-7B-Pretrain-GGUFmodel· 96 dl· ♡ 596 dl♡ 5
- 🤗TheBloke/ALMA-13B-Pretrain-GGUFmodel· 213 dl· ♡ 12213 dl♡ 12
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Attention Dropout · Residual Connection · Adam · Weight Decay · Dropout · Cosine Annealing
