SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems
Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, Raheem, Beyah

TL;DR
SirenAttack introduces a versatile, effective, and stealthy method for generating adversarial audio that can deceive various end-to-end acoustic systems with high success rates, raising security concerns.
Contribution
It presents a novel attack method capable of fooling multiple acoustic systems in both white-box and black-box scenarios, with high success and stealthiness.
Findings
Achieves 99.45% success rate on IEMOCAP with ResNet18
Deceives multiple ASR platforms like Google Cloud and IBM
Stealthy audios indistinguishable from benign sounds
Abstract
Despite their immense popularity, deep learning-based acoustic systems are inherently vulnerable to adversarial attacks, wherein maliciously crafted audios trigger target systems to misbehave. In this paper, we present SirenAttack, a new class of attacks to generate adversarial audios. Compared with existing attacks, SirenAttack highlights with a set of significant features: (i) versatile -- it is able to deceive a range of end-to-end acoustic systems under both white-box and black-box settings; (ii) effective -- it is able to generate adversarial audios that can be recognized as specific phrases by target acoustic systems; and (iii) stealthy -- it is able to generate adversarial audios indistinguishable from their benign counterparts to human perception. We empirically evaluate SirenAttack on a set of state-of-the-art deep learning-based acoustic systems (including speech command…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Digital Media Forensic Detection · Speech Recognition and Synthesis
