Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Omnilingual ASR team: Gil Keren; Artyom Kozhevnikov; Yen Meng; Christophe Ropers; Matthew Setzler; Skyler Wang; Ife Adebara; Michael Auli; Can Balioglu; Kevin Chan; Chierh Cheng; Joe Chuang; Caley Droof; Mark Duppenthaler; Paul-Ambroise Duquenne; Alexander Erben; Cynthia Gao; Gabriel Mejia Gonzalez; Kehan Lyu; Sagar Miglani; Vineel Pratap; Kaushik Ram Sadagopan; Safiyyah Saleem; Arina Turkatenko; Albert Ventayol-Boada; Zheng-Xin Yong; Yu-An Chung; Jean Maillard; Rashel Moritz; Alexandre Mourachko; Mary Williamson; Shireen Yates

arXiv:2511.09690·cs.CL·November 14, 2025

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Omnilingual ASR team: Gil Keren, Artyom Kozhevnikov, Yen Meng, Christophe Ropers, Matthew Setzler, Skyler Wang, Ife Adebara, Michael Auli, Can Balioglu, Kevin Chan, Chierh Cheng, Joe Chuang, Caley Droof, Mark Duppenthaler, Paul-Ambroise Duquenne, Alexander Erben, Cynthia Gao

PDF

Open Access 6 Models 3 Datasets

TL;DR

Omnilingual ASR is a scalable, open-source multilingual speech recognition system supporting over 1,600 languages, including many previously unserved, by leveraging self-supervised learning and community-sourced data for broad accessibility.

Contribution

The paper introduces Omnilingual ASR, the first extensible large-scale multilingual ASR system capable of supporting hundreds of languages with minimal data, using a novel architecture and training approach.

Findings

01

Supports over 1,600 languages, including 500+ new to ASR.

02

Achieves strong zero-shot generalization to unseen languages.

03

Outperforms prior systems, especially in low-resource scenarios.

Abstract

Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · ICT in Developing Communities · Phonetics and Phonology Research