TL;DR
This paper introduces a method to generate robust audio adversarial examples capable of physically attacking speech recognition systems despite environmental noise and reverberation, highlighting potential security threats.
Contribution
The authors develop a novel approach that simulates physical-world transformations during adversarial example generation, improving attack robustness in real-world scenarios.
Findings
Adversarial examples successfully attack speech recognition models in physical environments.
Human listening tests show adversarial examples are imperceptible.
Proposed method outperforms previous approaches in robustness.
Abstract
We propose a method to generate audio adversarial examples that can attack a state-of-the-art speech recognition model in the physical world. Previous work assumes that generated adversarial examples are directly fed to the recognition model, and is not able to perform such a physical attack because of reverberation and noise from playback environments. In contrast, our method obtains robust adversarial examples by simulating transformations caused by playback or recording in the physical world and incorporating the transformations into the generation process. Evaluation and a listening experiment demonstrated that our adversarial examples are able to attack without being noticed by humans. This result suggests that audio adversarial examples generated by the proposed method may become a real threat.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
