Comparative Study on Noise-Augmented Training and its Effect on Adversarial Robustness in ASR Systems

Karla Pizzi; Mat\'ias Pizarro; Asja Fischer

arXiv:2409.01813·eess.AS·November 10, 2025

Comparative Study on Noise-Augmented Training and its Effect on Adversarial Robustness in ASR Systems

Karla Pizzi, Mat\'ias Pizarro, Asja Fischer

PDF

Open Access

TL;DR

This paper compares how different noise-augmented training methods affect the adversarial robustness of various ASR systems, showing that noise augmentation improves both noisy speech performance and attack resistance.

Contribution

It provides a comparative analysis of four ASR architectures trained with different noise augmentation strategies, highlighting the benefits for robustness against adversarial attacks.

Findings

01

Noise augmentation improves adversarial robustness.

02

Models trained with noise augmentation perform better on noisy speech.

03

Robustness gains are consistent across different ASR architectures.

Abstract

In this study, we investigate whether noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition (ASR) systems. We conduct a comparative analysis of the adversarial robustness of four different ASR architectures, each trained under three different augmentation conditions: (1) background noise, speed variations, and reverberations; (2) speed variations only; (3) no data augmentation. We then evaluate the robustness of all resulting models against attacks with white-box or black-box adversarial examples. Our results demonstrate that noise augmentation not only enhances model performance on noisy speech but also improves the model's robustness to adversarial attacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Ultrasonics and Acoustic Wave Propagation · Acoustic Wave Phenomena Research

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings