Bring the Noise: Introducing Noise Robustness to Pretrained Automatic   Speech Recognition

Patrick Eickhoff; Matthias M\"oller; Theresa Pekarek Rosin; Johannes; Twiefel; Stefan Wermter

arXiv:2309.02145·cs.CL·September 6, 2023·1 cites

Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

Patrick Eickhoff, Matthias M\"oller, Theresa Pekarek Rosin, Johannes, Twiefel, Stefan Wermter

PDF

Open Access

TL;DR

This paper introduces Cleancoder, a noise-robust preprocessor for speech recognition that can be integrated with various encoder-decoder models to improve accuracy in noisy environments.

Contribution

It presents a novel, architecture-agnostic denoising preprocessor that enhances automatic speech recognition performance under noisy conditions.

Findings

01

Cleancoder effectively filters noise from speech signals.

02

Improves Word Error Rate (WER) in noisy environments.

03

Works with both pretrained and training-from-scratch ASR models.

Abstract

In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preprocessor network, which can be used as a frontend for downstream ASR models. However, the proposed methods were limited to specific fully convolutional architectures. In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. We propose the Cleancoder preprocessor architecture that extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques