Building Robust and Scalable Multilingual ASR for Indian Languages

Arjun Gangwar; Kaousheik Jayakumar; S. Umesh

arXiv:2511.15418·cs.CL·November 20, 2025

Building Robust and Scalable Multilingual ASR for Indian Languages

Arjun Gangwar, Kaousheik Jayakumar, S. Umesh

PDF

Open Access

TL;DR

This paper presents a novel multilingual ASR system for Indian languages that improves language and dialect prediction accuracy using a multi-decoder architecture and phonemic intermediate representations, outperforming baselines in several languages.

Contribution

Introduces a Multi-Decoder architecture with phonemic Common Label Set for multilingual ASR, enhancing performance without additional data, and achieves state-of-the-art results in language and dialect identification.

Findings

01

Outperforms baseline in 3 languages in WER/CER

02

Achieves highest language and dialect ID accuracy among participants

03

Effective phonemic-to-grapheme conversion methods developed

Abstract

This paper describes the systems developed by SPRING Lab, Indian Institute of Technology Madras, for the ASRU MADASR 2.0 challenge. The systems developed focuses on adapting ASR systems to improve in predicting the language and dialect of the utterance among 8 languages across 33 dialects. We participated in Track 1 and Track 2, which restricts the use of additional data and develop from-the-scratch multilingual systems. We presented a novel training approach using Multi-Decoder architecture with phonemic Common Label Set (CLS) as intermediate representation. It improved the performance over the baseline (in the CLS space). We also discuss various methods used to retain the gain obtained in the phonemic space while converting them back to the corresponding grapheme representations. Our systems beat the baseline in 3 languages (Track 2) in terms of WER/CER and achieved the highest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Text and Document Classification Technologies