TL;DR
This paper demonstrates a physical attack method on practical speaker verification systems using universal adversarial perturbations played as a separate audio source, achieving high success rates without detection.
Contribution
It introduces a two-step optimization algorithm for universal adversarial perturbations effective in physical scenarios, bypassing replay detection and maintaining speech recognition accuracy.
Findings
Targeted attack success rate of 100% in physical experiments
Minimal impact on speech recognition with only 3.55% increase in WER
Perturbations remain effective after being played over the air
Abstract
In authentication scenarios, applications of practical speaker verification systems usually require a person to read a dynamic authentication text. Previous studies played an audio adversarial example as a digital signal to perform physical attacks, which would be easily rejected by audio replay detection modules. This work shows that by playing our crafted adversarial perturbation as a separate source when the adversary is speaking, the practical speaker verification system will misjudge the adversary as a target speaker. A two-step algorithm is proposed to optimize the universal adversarial perturbation to be text-independent and has little effect on the authentication text recognition. We also estimated room impulse response (RIR) in the algorithm which allowed the perturbation to be effective after being played over the air. In the physical experiment, we achieved targeted attacks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
