Parrot-Trained Adversarial Examples: Pushing the Practicality of Black-Box Audio Attacks against Speaker Recognition Models
Rui Duan, Zhe Qu, Leah Ding, Yao Liu, Zhuo Lu

TL;DR
This paper introduces a novel black-box audio attack method using parrot training and voice conversion to generate adversarial examples with high transferability, requiring minimal knowledge about the target speaker recognition system.
Contribution
It proposes a new parrot training mechanism leveraging short speech samples and voice conversion to create effective adversarial examples without probing the target model.
Findings
Achieves up to 80.8% attack success rate in digital scenarios.
Attains nearly 58.3% success rate against smart devices.
Demonstrates high transferability of adversarial examples with good perceptual quality.
Abstract
Audio adversarial examples (AEs) have posed significant security challenges to real-world speaker recognition systems. Most black-box attacks still require certain information from the speaker recognition model to be effective (e.g., keeping probing and requiring the knowledge of similarity scores). This work aims to push the practicality of the black-box attacks by minimizing the attacker's knowledge about a target speaker recognition model. Although it is not feasible for an attacker to succeed with completely zero knowledge, we assume that the attacker only knows a short (or a few seconds) speech sample of a target speaker. Without any probing to gain further knowledge about the target model, we propose a new mechanism, called parrot training, to generate AEs against the target model. Motivated by recent advancements in voice conversion (VC), we propose to use the one short sentence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis · Hate Speech and Cyberbullying Detection
