Sequential Randomized Smoothing for Adversarially Robust Speech Recognition
Raphael Olivier, Bhiksha Raj

TL;DR
This paper introduces a novel speech recognition model that employs speech-specific enhancement and voting techniques to achieve robustness against adversarial attacks, overcoming challenges of applying randomized smoothing to sequential outputs.
Contribution
It adapts randomized smoothing for speech recognition by integrating speech enhancement and ROVER voting, providing a new defense mechanism against adversarial perturbations.
Findings
Robust to all attacks using inaudible noise
Broken only with very high distortion
Effective against adaptive state-of-the-art attacks
Abstract
While Automatic Speech Recognition has been shown to be vulnerable to adversarial attacks, defenses against these attacks are still lagging. Existing, naive defenses can be partially broken with an adaptive attack. In classification tasks, the Randomized Smoothing paradigm has been shown to be effective at defending models. However, it is difficult to apply this paradigm to ASR tasks, due to their complexity and the sequential nature of their outputs. Our paper overcomes some of these challenges by leveraging speech-specific tools like enhancement and ROVER voting to design an ASR model that is robust to perturbations. We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis · Hate Speech and Cyberbullying Detection
MethodsRandomized Smoothing
