TL;DR
This paper investigates the creation of robust audio adversarial examples that can successfully attack speech recognition systems over the air, addressing previous limitations caused by overfitting in generation methods.
Contribution
The authors identify flaws in existing adversarial audio generation methods and propose a new approach that improves over-the-air attack success rates by mitigating overfitting issues.
Findings
Improved adversarial example generation with varying offsets
Enhanced success rate of over-the-air attacks
Empirical validation shows significant reduction in edit distance
Abstract
Adversarial examples tremendously threaten the availability and integrity of machine learning-based systems. While the feasibility of such attacks has been observed first in the domain of image processing, recent research shows that speech recognition is also susceptible to adversarial attacks. However, reliably bridging the air gap (i.e., making the adversarial examples work when recorded via a microphone) has so far eluded researchers. We find that due to flaws in the generation process, state-of-the-art adversarial example generation methods cause overfitting because of the binning operation in the target speech recognition system (e.g., Mozilla Deepspeech). We devise an approach to mitigate this flaw and find that our method improves generation of adversarial examples with varying offsets. We confirm the significant improvement with our approach by empirical comparison of the edit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
