A Paradigm Shift in Machine Translation: Boosting Translation   Performance of Large Language Models

Haoran Xu; Young Jin Kim; Amr Sharaf; Hany Hassan Awadalla

arXiv:2309.11674·cs.CL·February 7, 2024·27 cites

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla

PDF

Open Access 1 Repo 10 Models 2 Datasets 1 Video

TL;DR

This paper introduces a novel two-stage fine-tuning method for moderate-sized LLMs that significantly improves translation quality, surpassing larger models and traditional methods without relying on extensive parallel data.

Contribution

The study presents ALMA, a new fine-tuning approach that enhances translation performance of 7B and 13B LLMs using monolingual and minimal high-quality parallel data, establishing a new training paradigm.

Findings

01

Achieved over 12 BLEU and 12 COMET improvements across 10 translation directions.

02

Outperformed larger models like NLLB-54B and GPT-3.5 in translation tasks.

03

Demonstrated effectiveness of the two-stage fine-tuning strategy for machine translation.

Abstract

Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fe1ixxu/alma
pytorchOfficial

Models

Datasets

Videos

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Attention Dropout · Residual Connection · Adam · Weight Decay · Dropout · Cosine Annealing