Unified Modeling of Multi-Domain Multi-Device ASR Systems
Soumyajit Mitra, Swayambhu Nath Ray, Bharat Padi, Arunasish Sen,, Raghavendra Bilgi, Harish Arsikere, Shalini Ghosh, Ajay Srinivasamurthy, Sri, Garimella

TL;DR
This paper introduces a unified modeling approach for multi-domain, multi-device ASR systems that outperforms traditional domain-specific models by integrating domain embeddings, experts, and adversarial training.
Contribution
It presents a novel unified model architecture for multi-domain multi-device ASR, combining domain embedding, experts, and adversarial training, with demonstrated accuracy improvements.
Findings
Outperforms per-domain models with up to 10% relative accuracy gain
Achieves these gains with negligible increase in parameters
Demonstrates effectiveness of each innovation through ablation studies
Abstract
Modern Automatic Speech Recognition (ASR) systems often use a portfolio of domain-specific models in order to get high accuracy for distinct user utterance types across different devices. In this paper, we propose an innovative approach that integrates the different per-domain per-device models into a unified model, using a combination of domain embedding, domain experts, mixture of experts and adversarial training. We run careful ablation studies to show the benefit of each of these innovations in contributing to the accuracy of the overall unified model. Experiments show that our proposed unified modeling approach actually outperforms the carefully tuned per-domain models, giving relative gains of up to 10% over a baseline model with negligible increase in the number of parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems
