Voxtral

Alexander H. Liu; Andy Ehrenberg; Andy Lo; Cl\'ement Denoix; Corentin Barreau; Guillaume Lample; Jean-Malo Delignon; Khyathi Raghavi Chandu; Patrick von Platen; Pavankumar Reddy Muddireddy; Sanchit Gandhi; Soham Ghosh; Srijan Mishra; Thomas Foubert; Abhinav Rastogi; Adam Yang; Albert Q. Jiang; Alexandre Sablayrolles; Am\'elie H\'eliou; Am\'elie Martin; Anmol Agarwal; Antoine Roux; Arthur Darcet; Arthur Mensch; Baptiste Bout; Baptiste Rozi\`ere; Baudouin De Monicault; Chris Bamford; Christian Wallenwein; Christophe Renaudin; Cl\'emence Lanfranchi; Darius Dabert; Devendra Singh Chaplot; Devon Mizelle; Diego de las Casas; Elliot Chane-Sane; Emilien Fugier; Emma Bou Hanna; Gabrielle Berrada; Gauthier Delerce; Gauthier Guinet; Georgii Novikov; Guillaume Martin; Himanshu Jaju; Jan Ludziejewski; Jason Rute; Jean-Hadrien Chabran; Jessica Chudnovsky; Joachim Studnia; Joep Barmentlo; Jonas Amar; Josselin Somerville Roberts; Julien Denize; Karan Saxena; Karmesh Yadav; Kartik Khandelwal; Kush Jain; L\'elio Renard Lavaud; L\'eonard Blier; Lingxiao Zhao; Louis Martin; Lucile Saulnier; Luyu Gao; Marie Pellat; Mathilde Guillaumin; Mathis Felardos; Matthieu Dinot; Maxime Darrin; Maximilian Augustin; Micka\"el Seznec; Neha Gupta; Nikhil Raghuraman; Olivier Duchenne; Patricia Wang; Patryk Saffer; Paul Jacob; Paul Wambergue; Paula Kurylowicz; Philom\`ene Chagniot; Pierre Stock; Pravesh Agrawal; R\'emi Delacourt; Romain Sauvestre; Roman Soletskyi; Sagar Vaze; Sandeep Subramanian; Saurabh Garg; Shashwat Dalal; Siddharth Gandhi; Sumukh Aithal; Szymon Antoniak; Teven Le Scao; Thibault Schueller; Thibaut Lavril; Thomas Robert; Thomas Wang; Timoth\'ee Lacroix; Tom Bewley; Valeriia Nemychnikova; Victor Paltz; Virgile Richard; Wen-Ding Li; William Marshall; Xuanyu Zhang; Yihan Wan; Yunhao Tang

arXiv:2507.13264·cs.SD·July 18, 2025

Voxtral

Alexander H. Liu, Andy Ehrenberg, Andy Lo, Cl\'ement Denoix, Corentin Barreau, Guillaume Lample, Jean-Malo Delignon, Khyathi Raghavi Chandu, Patrick von Platen, Pavankumar Reddy Muddireddy, Sanchit Gandhi, Soham Ghosh, Srijan Mishra, Thomas Foubert, Abhinav Rastogi, Adam Yang

PDF

Open Access 5 Models

TL;DR

Voxtral introduces two multimodal audio chat models that understand spoken audio and text, achieving state-of-the-art performance, with the smaller model capable of running locally and handling long audio and conversations.

Contribution

The paper presents Voxtral Mini and Small models with state-of-the-art audio understanding, long context handling, and new benchmarks for speech comprehension evaluation.

Findings

01

Voxtral models outperform existing models on audio benchmarks.

02

Voxtral Small can run locally due to its small size.

03

New benchmarks for speech understanding are introduced.

Abstract

We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enables the model to handle audio files up to 40 minutes in duration and long multi-turn conversations. We also contribute three benchmarks for evaluating speech understanding models on knowledge and trivia. Both Voxtral models are released under Apache 2.0 license.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTesticular diseases and treatments