Robustifying automatic speech recognition by extracting slowly varying features
Mat\'ias Pizarro, Dorothea Kolossa, Asja Fischer

TL;DR
This paper introduces a defense for automatic speech recognition systems against adversarial attacks by removing fast-changing features through slow feature analysis and filtering, significantly improving robustness without sacrificing accuracy on clean data.
Contribution
The paper proposes a novel defense mechanism using slow feature analysis and filtering to enhance ASR robustness against targeted adversarial attacks.
Findings
Models with the proposed defense are over four times more robust.
Performance on clean data remains similar to baseline models.
Defense effectively reduces vulnerability to adversarial examples.
Abstract
In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted attacks can modify an audio input signal in such a way that humans still recognise the same words, while ASR systems are steered to predict a different transcription. In this paper, we propose a defense mechanism against targeted adversarial attacks consisting in removing fast-changing features from the audio signals, either by applying slow feature analysis, a low-pass filter, or both, before feeding the input to the ASR system. We perform an empirical analysis of hybrid ASR models trained on data pre-processed in such a way. While the resulting models perform quite well on benign data, they are significantly more robust against targeted adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
