Quantifying and Maximizing the Benefits of Back-End Noise Adaption on   Attention-Based Speech Recognition Models

Coleman Hooper; Thierry Tambe; Gu-Yeon Wei

arXiv:2105.01134·eess.AS·September 27, 2021

Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models

Coleman Hooper, Thierry Tambe, Gu-Yeon Wei

PDF

Open Access

TL;DR

This paper investigates how attention-based BLSTM speech recognition models adapt to noise, identifying key components for noise robustness and demonstrating benefits of fine-tuning on noisy data, with an open-source dataset tool.

Contribution

It introduces a detailed analysis of noise adaptation in BLSTM models, highlighting the importance of the first encoder layer and showing advantages of fine-tuning from noisy pretraining.

Findings

01

Fine-tuning on noisy data improves accuracy in noisy environments.

02

The first encoder layer is critical for noise adaptation.

03

Weights in the first encoder layer are more important than other layers.

Abstract

This work analyzes how attention-based Bidirectional Long Short-Term Memory (BLSTM) models adapt to noise-augmented speech. We identify crucial components for noise adaptation in BLSTM models by freezing model components during fine-tuning. We first freeze larger model subnetworks and then pursue a fine-grained freezing approach in the encoder after identifying its importance for noise adaptation. The first encoder layer is shown to be crucial for noise adaptation, and the weights are shown to be more important than the other layers. Appreciable accuracy benefits are identified when fine-tuning on a target noisy environment from a model pretrained with noisy speech relative to fine-tuning from a model pretrained with only clean speech when tested on the target noisy environment. For this analysis, we produce our own dataset augmentation tool and it is open-sourced to encourage future…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing