A practical framework for multi-domain speech recognition and an   instance sampling method to neural language modeling

Yike Zhang; Xiaobing Feng; Yi Liu; Songjun Cao; Long Ma

arXiv:2203.04767·eess.AS·March 10, 2022

A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling

Yike Zhang, Xiaobing Feng, Yi Liu, Songjun Cao, Long Ma

PDF

Open Access

TL;DR

This paper introduces a multi-domain speech recognition framework with domain-specific reranking and an instance sampling method for neural language models, significantly improving accuracy in navigation and music domains.

Contribution

It presents a practical multi-domain ASR framework with a novel instance sampling method to handle data imbalance in neural language modeling.

Findings

01

Achieves 13-22% character error rate reduction

02

Effective in navigation and music domains

03

Improves multi-domain speech recognition accuracy

Abstract

Automatic speech recognition (ASR) systems used on smart phones or vehicles are usually required to process speech queries from very different domains. In such situations, a vanilla ASR system usually fails to perform well on every domain. This paper proposes a multi-domain ASR framework for Tencent Map, a navigation app used on smart phones and in-vehicle infotainment systems. The proposed framework consists of three core parts: a basic ASR module to generate n-best lists of a speech query, a text classification module to determine which domain the speech query belongs to, and a reranking module to rescore n-best lists using domain-specific language models. In addition, an instance sampling based method to training neural network language models (NNLMs) is proposed to address the data imbalance problem in multi-domain ASR. In experiments, the proposed framework was evaluated on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems