Multiple Confidence Gates For Joint Training Of SE And ASR

Tianrui Wang; Weibin Zhu; Yingying Gao; Junlan Feng; Shilei Zhang

arXiv:2204.00226·eess.AS·April 4, 2022

Multiple Confidence Gates For Joint Training Of SE And ASR

Tianrui Wang, Weibin Zhu, Yingying Gao, Junlan Feng, Shilei Zhang

PDF

Open Access

TL;DR

This paper introduces a novel joint training approach for speech enhancement and recognition using multiple confidence gates, improving robustness in noisy environments by filtering features for better ASR performance.

Contribution

It proposes a confidence gates prediction module to replace traditional SE, enhancing feature suitability for ASR in noisy conditions.

Findings

01

Outperforms traditional systems on clean, synthesized, and real noisy speech datasets.

02

Improves robustness of speech recognition in noisy environments.

03

Demonstrates effectiveness of confidence gates in joint training.

Abstract

Joint training of speech enhancement model (SE) and speech recognition model (ASR) is a common solution for robust ASR in noisy environments. SE focuses on improving the auditory quality of speech, but the enhanced feature distribution is changed, which is uncertain and detrimental to the ASR. To tackle this challenge, an approach with multiple confidence gates for jointly training of SE and ASR is proposed. A speech confidence gates prediction module is designed to replace the former SE module in joint training. The noisy speech is filtered by gates to obtain features that are easier to be fitting by the ASR network. The experimental results show that the proposed method has better performance than the traditional robust speech recognition system on test sets of clean speech, synthesized noisy speech, and real noisy speech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques