A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling
Yike Zhang, Xiaobing Feng, Yi Liu, Songjun Cao, Long Ma

TL;DR
This paper introduces a multi-domain speech recognition framework with domain-specific reranking and an instance sampling method for neural language models, significantly improving accuracy in navigation and music domains.
Contribution
It presents a practical multi-domain ASR framework with a novel instance sampling method to handle data imbalance in neural language modeling.
Findings
Achieves 13-22% character error rate reduction
Effective in navigation and music domains
Improves multi-domain speech recognition accuracy
Abstract
Automatic speech recognition (ASR) systems used on smart phones or vehicles are usually required to process speech queries from very different domains. In such situations, a vanilla ASR system usually fails to perform well on every domain. This paper proposes a multi-domain ASR framework for Tencent Map, a navigation app used on smart phones and in-vehicle infotainment systems. The proposed framework consists of three core parts: a basic ASR module to generate n-best lists of a speech query, a text classification module to determine which domain the speech query belongs to, and a reranking module to rescore n-best lists using domain-specific language models. In addition, an instance sampling based method to training neural network language models (NNLMs) is proposed to address the data imbalance problem in multi-domain ASR. In experiments, the proposed framework was evaluated on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
