Magistral

Mistral-AI: Abhinav Rastogi; Albert Q. Jiang; Andy Lo; Gabrielle Berrada; Guillaume Lample; Jason Rute; Joep Barmentlo; Karmesh Yadav; Kartik Khandelwal; Khyathi Raghavi Chandu; L\'eonard Blier; Lucile Saulnier; Matthieu Dinot; Maxime Darrin; Neha Gupta; Roman Soletskyi; Sagar Vaze; Teven Le Scao; Yihan Wang; Adam Yang; Alexander H. Liu; Alexandre Sablayrolles; Am\'elie H\'eliou; Am\'elie Martin; Andy Ehrenberg; Anmol Agarwal; Antoine Roux; Arthur Darcet; Arthur Mensch; Baptiste Bout; Baptiste Rozi\`ere; Baudouin De Monicault; Chris Bamford; Christian Wallenwein; Christophe Renaudin; Cl\'emence Lanfranchi; Darius Dabert; Devon Mizelle; Diego de las Casas; Elliot Chane-Sane; Emilien Fugier; Emma Bou Hanna; Gauthier Delerce; Gauthier Guinet; Georgii Novikov; Guillaume Martin; Himanshu Jaju; Jan Ludziejewski; Jean-Hadrien Chabran; Jean-Malo Delignon; Joachim Studnia; Jonas Amar; Josselin Somerville Roberts; Julien Denize; Karan Saxena; Kush Jain; Lingxiao Zhao; Louis Martin; Luyu Gao; L\'elio Renard Lavaud; Marie Pellat; Mathilde Guillaumin; Mathis Felardos; Maximilian Augustin; Micka\"el Seznec; Nikhil Raghuraman; Olivier Duchenne; Patricia Wang; Patrick von Platen; Patryk Saffer; Paul Jacob; Paul Wambergue; Paula Kurylowicz; Pavankumar Reddy Muddireddy; Philom\`ene Chagniot; Pierre Stock; Pravesh Agrawal; Romain Sauvestre; R\'emi Delacourt; Sanchit Gandhi; Sandeep Subramanian; Shashwat Dalal; Siddharth Gandhi; Soham Ghosh; Srijan Mishra; Sumukh Aithal; Szymon Antoniak; Thibault Schueller; Thibaut Lavril; Thomas Robert; Thomas Wang; Timoth\'ee Lacroix; Valeriia Nemychnikova; Victor Paltz; Virgile Richard; Wen-Ding Li; William Marshall; Xuanyu Zhang; Yunhao Tang

arXiv:2506.10910·cs.CL·June 13, 2025·2 cites

Magistral

Mistral-AI: Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, L\'eonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi

PDF

Open Access 10 Models 1 Datasets

TL;DR

Magistral introduces a new reasoning model and RL training pipeline for large language models, demonstrating that pure RL on text can enhance reasoning, instruction following, and multimodal understanding without relying on prior RL traces.

Contribution

The paper presents Magistral, a scalable RL pipeline built from scratch, and shows how pure RL on text data can improve LLM reasoning and multimodal capabilities.

Findings

01

RL on text data maintains or improves model capabilities

02

Pure RL training enhances reasoning and instruction following

03

Open-source Magistral Small for broader research use

Abstract

We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a simple method to force the reasoning language of the model, and show that RL on text data alone maintains most of the initial checkpoint's capabilities. We find that RL on text maintains or improves multimodal understanding, instruction following and function calling. We present Magistral Medium, trained for reasoning on top of Mistral Medium 3 with RL alone, and we open-source Magistral Small (Apache 2.0) which further includes cold-start data from Magistral Medium.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

ChuGyouk/DeepMath-Filtered-59.9K
dataset· 5 dl
5 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Topic Modeling