Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems
Hadi Abdullah, Muhammad Sajidur Rahman, Washington Garcia, Logan Blue,, Kevin Warren, Anurag Swarnim Yadav, Tom Shrimpton, Patrick Traynor

TL;DR
This paper presents black-box, transferable audio attacks on speech recognition and voice ID systems that cause high mistranscription and misidentification rates with minimal perceptible difference to humans.
Contribution
It introduces a novel pipeline-stage attack method that is model-agnostic and effective across different systems, even over cellular networks.
Findings
Attacks achieve up to 100% mistranscription and misidentification.
Humans do not statistically distinguish between regular and perturbed audio.
Vowels are particularly susceptible to the attack.
Abstract
Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and misidentification in state of the art systems, with minimal impact on human comprehension. Processing pipelines for modern systems are comprised of signal preprocessing and feature extraction steps, whose output is fed to a machine-learned model. Prior work has focused on the models, using white-box knowledge to tailor model-specific attacks. We focus on the pipeline stages before the models, which (unlike the models) are quite similar across systems. As such, our attacks are black-box and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
