Dual Application of Speech Enhancement for Automatic Speech Recognition

Ashutosh Pandey; Chunxi Liu; Yun Wang; Yatharth Saraf

arXiv:2011.03840·cs.SD·November 10, 2020

Dual Application of Speech Enhancement for Automatic Speech Recognition

Ashutosh Pandey, Chunxi Liu, Yun Wang, Yatharth Saraf

PDF

TL;DR

This paper explores dual use of speech enhancement in ASR, employing a DCRN for data augmentation and preprocessing, leading to significant improvements in recognition accuracy on social media videos.

Contribution

It introduces a novel combination of speech enhancement as both a data augmentation method and a preprocessing frontend for RNN-T based ASR systems.

Findings

01

11.2% relative improvement with enhancement-based data augmentation

02

8.3% improvement using enhancement as preprocessing

03

13.4% combined improvement with both techniques

Abstract

In this work, we exploit speech enhancement for improving a recurrent neural network transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent network (DCRN) for complex spectral mapping based speech enhancement, and find it helpful for ASR in two ways: a data augmentation technique, and a preprocessing frontend. In using it for ASR data augmentation, we exploit a KL divergence based consistency loss that is computed between the ASR outputs of original and enhanced utterances. In using speech enhancement as an effective ASR frontend, we propose a three-step training scheme based on model pretraining and feature selection. We evaluate our proposed techniques on a challenging social media English video dataset, and achieve an average relative improvement of 11.2% with speech enhancement based data augmentation, 8.3% with enhancement based preprocessing, and 13.4% when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.