Investigation of Monaural Front-End Processing for Robust ASR without   Retraining or Joint-Training

Zhihao Du; Xueliang Zhang; Jiqing Han

arXiv:1810.09067·cs.SD·October 25, 2018

Investigation of Monaural Front-End Processing for Robust ASR without Retraining or Joint-Training

Zhihao Du, Xueliang Zhang, Jiqing Han

PDF

Open Access

TL;DR

This paper investigates whether monaural speech separation front-end processing can directly enhance automatic speech recognition performance without retraining, showing significant WER reductions on the CHiME-3 dataset.

Contribution

It demonstrates the effectiveness of using monaural speech separation as a front-end for ASR without retraining or joint-training, which is less explored.

Findings

01

36.40% relative WER reduction for GMM-based ASR

02

11.78% relative WER reduction for DNN-based ASR

03

Enhanced features improve recognition performance without retraining

Abstract

In recent years, monaural speech separation has been formulated as a supervised learning problem, which has been systematically researched and shown the dramatical improvement of speech intelligibility and quality for human listeners. However, it has not been well investigated whether the methods can be employed as the front-end processing and directly improve the performance of a machine listener, i.e., an automatic speech recognizer, without retraining or joint-training the acoustic model. In this paper, we explore the effectiveness of the independent front-end processing for the multi-conditional trained ASR on the CHiME-3 challenge. We find that directly feeding the enhanced features to ASR can make 36.40% and 11.78% relative WER reduction for the GMM-based and DNN-based ASR respectively. We also investigate the affect of noisy phase and generalization ability under unmatched noise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Ultrasonics and Acoustic Wave Propagation · Speech Recognition and Synthesis