An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples
Armin Ettenhofer, Jan-Philipp Schulze, Karla Pizzi

TL;DR
This paper introduces an integrated algorithm for creating audio adversarial examples that are both imperceptible to humans and robust against physical transformations, using psychoacoustic models and neural network-generated room impulse responses.
Contribution
It presents a novel integrated approach combining psychoacoustic models and dynamic room impulse responses during generation, improving robustness and imperceptibility of audio adversarial examples.
Findings
Improved signal-to-noise ratio in adversarial examples.
Enhanced human perceptibility scores.
Trade-off observed with increased word error rate.
Abstract
Audio adversarial examples are audio files that have been manipulated to fool an automatic speech recognition (ASR) system, while still sounding benign to a human listener. Most methods to generate such samples are based on a two-step algorithm: first, a viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness. In this work, we present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step. The RIRs are dynamically created by a neural network during the generation process to simulate a physical environment to harden our examples against transformations experienced in over-the-air attacks. We compare the different approaches in three experiments: in a simulated environment and in a realistic over-the-air scenario to evaluate the robustness, and in a human study to evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
