Tower: An Open Multilingual Large Language Model for Translation-Related Tasks
Duarte M. Alves, Jos\'e Pombal, Nuno M. Guerreiro, Pedro H. Martins,, Jo\~ao Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes,, Sweta Agrawal, Pierre Colombo, Jos\'e G.C. de Souza, Andr\'e F.T. Martins

TL;DR
This paper introduces Tower, a multilingual large language model tailored for translation tasks, achieved through continued pretraining and instruction fine-tuning, outperforming open models and rivaling closed models in translation workflows.
Contribution
The paper presents a novel approach to adapt open LLMs for multiple translation-related tasks via continued pretraining and instruction fine-tuning, and releases the models and resources for future research.
Findings
Tower surpasses open models on translation tasks.
Tower is competitive with closed LLMs.
Resources and benchmarks are publicly released.
Abstract
While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and parallel data, creating TowerBase, followed by finetuning on instructions relevant for translation processes, creating TowerInstruct. Our final model surpasses open alternatives on several tasks relevant to translation workflows and is competitive with general-purpose closed LLMs. To facilitate future research, we release the Tower models, our specialization dataset, an evaluation framework for LLMs focusing on the translation ecosystem, and a collection of model generations, including ours, on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Unbabel/TowerInstruct-13B-v0.1model· 1.3k dl· ♡ 351.3k dl♡ 35
- 🤗BSC-LT/salamandraTA-7b-instructmodel· 1.6k dl· ♡ 251.6k dl♡ 25
- 🤗Unbabel/TowerBase-7B-v0.1model· 860 dl· ♡ 56860 dl♡ 56
- 🤗Unbabel/TowerInstruct-7B-v0.1model· 739 dl· ♡ 65739 dl♡ 65
- 🤗Unbabel/TowerBase-13B-v0.1model· 12 dl· ♡ 612 dl♡ 6
- 🤗Unbabel/TowerInstruct-7B-v0.2model· 1.5k dl· ♡ 401.5k dl♡ 40
- 🤗RichardErkhov/Unbabel_-_TowerInstruct-7B-v0.1-4bitsmodel· 3 dl3 dl
- 🤗RichardErkhov/Unbabel_-_TowerInstruct-7B-v0.1-8bitsmodel
- 🤗RichardErkhov/Unbabel_-_TowerBase-7B-v0.1-4bitsmodel· 1 dl1 dl
- 🤗RichardErkhov/Unbabel_-_TowerInstruct-7B-v0.1-ggufmodel· 13 dl13 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science
