TL;DR
This paper demonstrates the creation of targeted audio adversarial examples that are nearly indistinguishable from original audio but transcribe as any chosen phrase, highlighting vulnerabilities in speech recognition systems.
Contribution
The authors develop a white-box iterative attack method to generate targeted audio adversarial examples with high success rates on DeepSpeech, revealing new security challenges in speech recognition.
Findings
Achieved over 99.9% similarity between original and adversarial audio
Attacks successfully transcribed as target phrases with 100% success rate on DeepSpeech
Introduced a new domain for adversarial example research in audio
Abstract
We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
