Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Omnilingual ASR team: Gil Keren, Artyom Kozhevnikov, Yen Meng, Christophe Ropers, Matthew Setzler, Skyler Wang, Ife Adebara, Michael Auli, Can Balioglu, Kevin Chan, Chierh Cheng, Joe Chuang, Caley Droof, Mark Duppenthaler, Paul-Ambroise Duquenne, Alexander Erben, Cynthia Gao

TL;DR
Omnilingual ASR is a scalable, open-source multilingual speech recognition system supporting over 1,600 languages, including many previously unserved, by leveraging self-supervised learning and community-sourced data for broad accessibility.
Contribution
The paper introduces Omnilingual ASR, the first extensible large-scale multilingual ASR system capable of supporting hundreds of languages with minimal data, using a novel architecture and training approach.
Findings
Supports over 1,600 languages, including 500+ new to ASR.
Achieves strong zero-shot generalization to unseen languages.
Outperforms prior systems, especially in low-resource scenarios.
Abstract
Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗csukuangfj2/sherpa-onnx-omnilingual-asr-1600-languages-300M-ctc-int8-2025-11-12model
- 🤗csukuangfj2/sherpa-onnx-omnilingual-asr-1600-languages-300M-ctc-2025-11-12model
- 🤗csukuangfj2/sherpa-onnx-omnilingual-asr-1600-languages-1B-ctc-2025-11-12model
- 🤗csukuangfj2/sherpa-onnx-omnilingual-asr-1600-languages-1B-ctc-int8-2025-11-12model
- 🤗csukuangfj2/sherpa-onnx-omnilingual-asr-1600-languages-1B-ctc-v2-2026-02-05model
- 🤗csukuangfj2/sherpa-onnx-omnilingual-asr-1600-languages-1B-ctc-v2-int8-2026-02-05model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · ICT in Developing Communities · Phonetics and Phonology Research
