Unified Modeling of Multi-Domain Multi-Device ASR Systems

Soumyajit Mitra; Swayambhu Nath Ray; Bharat Padi; Arunasish Sen,; Raghavendra Bilgi; Harish Arsikere; Shalini Ghosh; Ajay Srinivasamurthy; Sri; Garimella

arXiv:2205.06655·cs.CL·October 14, 2022

Unified Modeling of Multi-Domain Multi-Device ASR Systems

Soumyajit Mitra, Swayambhu Nath Ray, Bharat Padi, Arunasish Sen,, Raghavendra Bilgi, Harish Arsikere, Shalini Ghosh, Ajay Srinivasamurthy, Sri, Garimella

PDF

Open Access

TL;DR

This paper introduces a unified modeling approach for multi-domain, multi-device ASR systems that outperforms traditional domain-specific models by integrating domain embeddings, experts, and adversarial training.

Contribution

It presents a novel unified model architecture for multi-domain multi-device ASR, combining domain embedding, experts, and adversarial training, with demonstrated accuracy improvements.

Findings

01

Outperforms per-domain models with up to 10% relative accuracy gain

02

Achieves these gains with negligible increase in parameters

03

Demonstrates effectiveness of each innovation through ablation studies

Abstract

Modern Automatic Speech Recognition (ASR) systems often use a portfolio of domain-specific models in order to get high accuracy for distinct user utterance types across different devices. In this paper, we propose an innovative approach that integrates the different per-domain per-device models into a unified model, using a combination of domain embedding, domain experts, mixture of experts and adversarial training. We run careful ablation studies to show the benefit of each of these innovations in contributing to the accuracy of the overall unified model. Experiments show that our proposed unified modeling approach actually outperforms the carefully tuned per-domain models, giving relative gains of up to 10% over a baseline model with negligible increase in the number of parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems