Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition
Isha Pandey, Ashish Mittal, Vartul Bahuguna, Ganesh Ramakrishnan

TL;DR
This paper introduces SMEAR-MoE, a stabilized mixture-of-experts projector for multilingual speech recognition, which improves accuracy and maintains efficiency by enabling effective cross-lingual sharing and expert specialization.
Contribution
The paper proposes SMEAR-MoE, a novel stabilized MoE projector that prevents expert collapse and enhances multilingual ASR performance compared to traditional single-projector methods.
Findings
Achieves up to 7.6% relative WER reduction over baseline.
Experts show linguistically meaningful specialization.
Maintains comparable runtime efficiency.
Abstract
Recent advances in LLM-based ASR connect frozen speech encoders with Large Language Models (LLMs) via lightweight projectors. While effective in monolingual settings, a single projector struggles to capture the diverse acoustic-to-semantic mappings required for multilingual ASR. To address this, we propose SMEAR-MoE, a stabilized Mixture-of-Experts projector that ensures dense gradient flow to all experts, preventing expert collapse while enabling cross-lingual sharing. We systematically compare monolithic, static multi-projector, and dynamic MoE designs across four Indic languages (Hindi, Marathi, Tamil, Telugu). Our SMEAR-MoE achieves strong performance, delivering upto a 7.6% relative WER reduction over the single-projector baseline, while maintaining comparable runtime efficiency. Analysis of expert routing further shows linguistically meaningful specialization, with related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
