Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier
John Dang, Shivalika Singh, Daniel D'souza, Arash Ahmadian, Alejandro, Salamanca, Madeline Smith, Aidan Peppin, Sungjin Hong, Manoj Govindassamy,, Terrence Zhao, Sandra Kublik, Meor Amer, Viraat Aryabumi, Jon Ander Campos,, Yi-Chern Tan, Tom Kocmi, Florian Strub

TL;DR
Aya Expanse introduces a new family of large multilingual models that outperform existing models across 23 languages, setting a new state-of-the-art in multilingual NLP performance.
Contribution
The paper presents the Aya Expanse model family, combining research advancements to achieve superior multilingual performance and releasing open weights and a new multilingual evaluation dataset.
Findings
Aya Expanse models outperform leading open-weight models in multilingual tasks.
Aya Expanse 32B surpasses larger models like Llama 3.1 70B in win-rate.
The models achieve up to 76.6% win-rate on the Arena-Hard-Auto dataset.
Abstract
We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models. By leveraging several years of research at Cohere For AI and Cohere, including advancements in data arbitrage, multilingual preference training, and model merging, Aya Expanse sets a new state-of-the-art in multilingual performance. Our evaluations on the Arena-Hard-Auto dataset, translated into 23 languages, demonstrate that Aya Expanse 8B and 32B outperform leading open-weight models in their respective parameter classes, including Gemma 2, Qwen 2.5, and Llama 3.1, achieving up to a 76.6% win-rate. Notably, Aya Expanse 32B outperforms Llama 3.1 70B, a model with twice as many parameters, achieving a 54.0% win-rate. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗CohereLabs/aya-expanse-8bmodel· 16k dl· ♡ 42316k dl♡ 423
- 🤗CohereLabs/aya-expanse-32bmodel· 6.7k dl· ♡ 2896.7k dl♡ 289
- 🤗jebish7/aya-expanse-8bmodel· 1 dl1 dl
- 🤗jebish7/aya-expanse-32bmodel· 1 dl1 dl
- 🤗CohereLabs/aya-vision-8bmodel· 83k dl· ♡ 31683k dl♡ 316
- 🤗CohereLabs/aya-vision-32bmodel· 266 dl· ♡ 223266 dl♡ 223
- 🤗unsloth/aya-vision-8bmodel· 34 dl· ♡ 134 dl♡ 1
- 🤗unsloth/aya-vision-8b-unsloth-bnb-4bitmodel· 40 dl40 dl
- 🤗unsloth/aya-vision-8b-bnb-4bitmodel· 26 dl· ♡ 226 dl♡ 2
- 🤗unsloth/aya-vision-32bmodel· 30 dl· ♡ 330 dl♡ 3
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLLaMA
