Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound
Xinfeng Li, Junning Ze, Chen Yan, Yushi Cheng, Xiaoyu Ji, Wenyuan Xu

TL;DR
This paper introduces Tuner, an inaudible ultrasound backdoor attack targeting the enrollment stage of speaker recognition systems, demonstrating high success rates and robustness against defenses.
Contribution
The paper presents a novel ultrasound-based backdoor attack method for speaker recognition enrollment, addressing challenges of user variability and enhancing real-world robustness.
Findings
Successfully bypasses multiple SRS models
Remains effective across different speakers and speech content
Robust against various defense mechanisms
Abstract
Automatic Speaker Recognition Systems (SRSs) have been widely used in voice applications for personal identification and access control. A typical SRS consists of three stages, i.e., training, enrollment, and recognition. Previous work has revealed that SRSs can be bypassed by backdoor attacks at the training stage or by adversarial example attacks at the recognition stage. In this paper, we propose Tuner, a new type of backdoor attack against the enrollment stage of SRS via adversarial ultrasound modulation, which is inaudible, synchronization-free, content-independent, and black-box. Our key idea is to first inject the backdoor into the SRS with modulated ultrasound when a legitimate user initiates the enrollment, and afterward, the polluted SRS will grant access to both the legitimate user and the adversary with high confidence. Our attack faces a major challenge of unpredictable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Geophysical Methods and Applications
