EuroLLM-22B: Technical Report
Miguel Moura Ramos, Duarte M. Alves, Hippolyte Gisserot-Boukhlef, Jo\~ao Alves, Pedro Henrique Martins, Patrick Fernandes, Jos\'e Pombal, Nuno M. Guerreiro, Ricardo Rei, Nicolas Boizard, Amin Farajian, Mateusz Klimaszewski, Jos\'e G. C. de Souza, Barry Haddow, Fran\c{c}ois Yvon

TL;DR
EuroLLM-22B is a large multilingual language model designed specifically for European languages, addressing underrepresentation issues and demonstrating strong performance across various benchmarks.
Contribution
This paper introduces EuroLLM-22B, a multilingual model trained from scratch to support all EU languages and additional languages, with comprehensive development details and released resources.
Findings
EuroLLM-22B performs well on reasoning, instruction following, and translation benchmarks.
The model achieves results competitive with similar-sized models.
Resources and datasets are publicly released for future research.
Abstract
This report presents EuroLLM-22B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of European languages being underrepresented and underserved in existing open large language models. We provide a comprehensive overview of EuroLLM-22B's development, including tokenizer design, architectural specifications, data filtering, and training procedures. Across a broad set of multilingual benchmarks, EuroLLM-22B demonstrates strong performance in reasoning, instruction following, and translation, achieving results competitive with models of comparable size. To support future research, we release our base and instruction-tuned models, our multilingual web pretraining data and updated EuroBlocks instruction datasets, as well as our pre-training and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗utter-project/EuroLLM-22B-Instruct-2512model· 3.9k dl· ♡ 623.9k dl♡ 62
- 🤗utter-project/EuroLLM-9B-2512model· 1.4k dl· ♡ 11.4k dl♡ 1
- 🤗utter-project/EuroLLM-22B-2512model· 1.6k dl· ♡ 141.6k dl♡ 14
- 🤗utter-project/EuroLLM-9B-Instruct-2512model· 11k dl· ♡ 611k dl♡ 6
- 🤗aifeifei798/EuroLLM-22B-Instruct-2512-FTmodel· 9 dl9 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
