Investigating the translation capabilities of Large Language Models trained on parallel data only
Javier Garc\'ia Gilabert, Carlos Escolano, Aleix Sant Savall,, Francesca De Luca Fornaciari, Audrey Mash, Xixian Liao, Maite Melero

TL;DR
This paper introduces PLUME, a set of large language models trained solely on parallel data for Catalan, demonstrating competitive translation performance and providing insights into LLM translation capabilities without fine-tuning.
Contribution
The work presents the first LLMs trained exclusively on parallel data for translation, analyzing their performance and cross-lingual representations without instruction fine-tuning.
Findings
PLUME models perform comparably to encoder-decoder models on multiple translation tasks.
Prompt design significantly influences translation performance.
LLMs exhibit meaningful cross-lingual representations even without fine-tuning.
Abstract
In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce PLUME (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗projecte-aina/Plume32kmodel· 2 dl· ♡ 32 dl♡ 3
- 🤗projecte-aina/Plume128kmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗projecte-aina/Plume256kmodel· 14 dl· ♡ 614 dl♡ 6
- 🤗RichardErkhov/projecte-aina_-_Plume32k-4bitsmodel
- 🤗RichardErkhov/projecte-aina_-_Plume32k-8bitsmodel
- 🤗RichardErkhov/projecte-aina_-_Plume256k-8bitsmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
