Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition
Goutham Rajendran, Wei Zou

TL;DR
This paper analyzes the robustness of pre-trained neural speech recognition models like wav2vec2, HuBERT, and DistilHuBERT against various noise types, providing insights into their layer-wise behavior and error propagation.
Contribution
It offers a comprehensive robustness analysis of popular pre-trained speech models, including layer-wise and error propagation insights under noisy conditions.
Findings
Models degrade in performance with increased noise.
Layer-wise analysis reveals how noise affects different layers.
Error propagation patterns differ between clean and noisy data.
Abstract
We investigate robustness properties of pre-trained neural models for automatic speech recognition. Real life data in machine learning is usually very noisy and almost never clean, which can be attributed to various factors depending on the domain, e.g. outliers, random noise and adversarial noise. Therefore, the models we develop for various tasks should be robust to such kinds of noisy data, which led to the thriving field of robust machine learning. We consider this important issue in the setting of automatic speech recognition. With the increasing popularity of pre-trained models, it's an important question to analyze and understand the robustness of such models to noise. In this work, we perform a robustness analysis of the pre-trained neural models wav2vec2, HuBERT and DistilHuBERT on the LibriSpeech and TIMIT datasets. We use different kinds of noising mechanisms and measure the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Neural Networks and Applications
