Beyond $L_p$ clipping: Equalization-based Psychoacoustic Attacks against ASRs
Hadi Abdullah, Muhammad Sajidur Rahman, Christian Peeters, Cassidy, Gibson, Washington Garcia, Vincent Bindschaedler, Thomas Shrimpton, Patrick, Traynor

TL;DR
This paper introduces an equalization-based psychoacoustic attack method that effectively targets both traditional and end-to-end ASR systems, producing imperceptible adversarial audio with minimal audible distortion.
Contribution
It presents a novel psychoacoustic attack technique applicable to modern end-to-end ASRs, overcoming limitations of previous methods that only worked on traditional models.
Findings
Successfully attacked DeepSpeech and Wav2Letter systems.
80 out of 100 participants preferred the new attack audio as less noisy.
Achieved low audible distortion in adversarial examples.
Abstract
Automatic Speech Recognition (ASR) systems convert speech into text and can be placed into two broad categories: traditional and fully end-to-end. Both types have been shown to be vulnerable to adversarial audio examples that sound benign to the human ear but force the ASR to produce malicious transcriptions. Of these attacks, only the "psychoacoustic" attacks can create examples with relatively imperceptible perturbations, as they leverage the knowledge of the human auditory system. Unfortunately, existing psychoacoustic attacks can only be applied against traditional models, and are obsolete against the newer, fully end-to-end ASRs. In this paper, we propose an equalization-based psychoacoustic attack that can exploit both traditional and fully end-to-end ASRs. We successfully demonstrate our attack against real-world ASRs that include DeepSpeech and Wav2Letter. Moreover, we employ a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Digital Media Forensic Detection · Music and Audio Processing
