SirenAttack: Generating Adversarial Audio for End-to-End Acoustic   Systems

Tianyu Du; Shouling Ji; Jinfeng Li; Qinchen Gu; Ting Wang; Raheem; Beyah

arXiv:1901.07846·cs.CR·July 25, 2019·19 cites

SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems

Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, Raheem, Beyah

PDF

Open Access

TL;DR

SirenAttack introduces a versatile, effective, and stealthy method for generating adversarial audio that can deceive various end-to-end acoustic systems with high success rates, raising security concerns.

Contribution

It presents a novel attack method capable of fooling multiple acoustic systems in both white-box and black-box scenarios, with high success and stealthiness.

Findings

01

Achieves 99.45% success rate on IEMOCAP with ResNet18

02

Deceives multiple ASR platforms like Google Cloud and IBM

03

Stealthy audios indistinguishable from benign sounds

Abstract

Despite their immense popularity, deep learning-based acoustic systems are inherently vulnerable to adversarial attacks, wherein maliciously crafted audios trigger target systems to misbehave. In this paper, we present SirenAttack, a new class of attacks to generate adversarial audios. Compared with existing attacks, SirenAttack highlights with a set of significant features: (i) versatile -- it is able to deceive a range of end-to-end acoustic systems under both white-box and black-box settings; (ii) effective -- it is able to generate adversarial audios that can be recognized as specific phrases by target acoustic systems; and (iii) stealthy -- it is able to generate adversarial audios indistinguishable from their benign counterparts to human perception. We empirically evaluate SirenAttack on a set of state-of-the-art deep learning-based acoustic systems (including speech command…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Digital Media Forensic Detection · Speech Recognition and Synthesis