SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops
Eshaq Jamdar, Amith Kamath Belman

TL;DR
This paper introduces SyntheticPop, a novel attack that embeds synthetic noises into spoofed audio to significantly undermine voice verification systems enhanced with VoicePop, revealing their vulnerability to such adversarial manipulations.
Contribution
We propose SyntheticPop, an effective attack method that degrades VA+VoicePop performance by embedding synthetic noises, highlighting the need for more robust defenses against such attacks.
Findings
SyntheticPop achieves over 95% attack success rate.
VA+VoicePop accuracy drops to 14% under SyntheticPop attack.
Baseline label flipping reduces accuracy to 37%.
Abstract
Voice Authentication (VA), also known as Automatic Speaker Verification (ASV), is a widely adopted authentication method, particularly in automated systems like banking services, where it serves as a secondary layer of user authentication. Despite its popularity, VA systems are vulnerable to various attacks, including replay, impersonation, and the emerging threat of deepfake audio that mimics the voice of legitimate users. To mitigate these risks, several defense mechanisms have been proposed. One such solution, Voice Pops, aims to distinguish an individual's unique phoneme pronunciations during the enrollment process. While promising, the effectiveness of VA+VoicePop against a broader range of attacks, particularly logical or adversarial attacks, remains insufficiently explored. We propose a novel attack method, which we refer to as SyntheticPop, designed to target the phoneme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
