Speaker Reinforcement Using Target Source Extraction for Robust   Automatic Speech Recognition

Catalin Zorila; Rama Doddipatla

arXiv:2205.04433·eess.AS·May 10, 2022

Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition

Catalin Zorila, Rama Doddipatla

PDF

TL;DR

This paper proposes a speaker reinforcement method that improves single-channel automatic speech recognition accuracy in noisy conditions without retraining the acoustic model, by remixing enhanced and original signals.

Contribution

It introduces a novel remixing approach with a DNN speaker extraction denoiser that enhances ASR performance without retraining the acoustic model.

Findings

01

Achieves approximately 23-25% relative accuracy improvements.

02

Outperforms state-of-the-art reference methods.

03

Effective in both simulated and real noisy environments.

Abstract

Improving the accuracy of single-channel automatic speech recognition (ASR) in noisy conditions is challenging. Strong speech enhancement front-ends are available, however, they typically require that the ASR model is retrained to cope with the processing artifacts. In this paper we explore a speaker reinforcement strategy for improving recognition performance without retraining the acoustic model (AM). This is achieved by remixing the enhanced signal with the unprocessed input to alleviate the processing artifacts. We evaluate the proposed approach using a DNN speaker extraction based speech denoiser trained with a perceptually motivated loss function. Results show that (without AM retraining) our method yields about 23% and 25% relative accuracy gains compared with the unprocessed for the monoaural simulated and real CHiME-4 evaluation sets, respectively, and outperforms a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Model