Investigating the translation capabilities of Large Language Models   trained on parallel data only

Javier Garc\'ia Gilabert; Carlos Escolano; Aleix Sant Savall,; Francesca De Luca Fornaciari; Audrey Mash; Xixian Liao; Maite Melero

arXiv:2406.09140·cs.CL·June 14, 2024

Investigating the translation capabilities of Large Language Models trained on parallel data only

Javier Garc\'ia Gilabert, Carlos Escolano, Aleix Sant Savall,, Francesca De Luca Fornaciari, Audrey Mash, Xixian Liao, Maite Melero

PDF

Open Access 1 Repo 6 Models

TL;DR

This paper introduces PLUME, a set of large language models trained solely on parallel data for Catalan, demonstrating competitive translation performance and providing insights into LLM translation capabilities without fine-tuning.

Contribution

The work presents the first LLMs trained exclusively on parallel data for translation, analyzing their performance and cross-lingual representations without instruction fine-tuning.

Findings

01

PLUME models perform comparably to encoder-decoder models on multiple translation tasks.

02

Prompt design significantly influences translation performance.

03

LLMs exhibit meaningful cross-lingual representations even without fine-tuning.

Abstract

In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce PLUME (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

projecte-aina/plume
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training