Vulnerabilities of Audio-Based Biometric Authentication Systems Against Deepfake Speech Synthesis
Mengze Hong, Di Jiang, Zeying Xie, Weiwei Zhao, Guan Wang, Chen Jason Zhang

TL;DR
This paper empirically evaluates the security vulnerabilities of audio-based biometric systems against deepfake speech, revealing that current models are easily fooled and anti-spoofing detectors lack robustness, highlighting the need for improved defenses.
Contribution
It provides a large-scale empirical assessment of speaker verification vulnerabilities to deepfake speech, emphasizing the challenges in current anti-spoofing methods and proposing directions for future security enhancements.
Findings
Small sample-trained voice cloning can bypass verification
Anti-spoofing detectors lack cross-method robustness
Significant gap between in-domain and real-world performance
Abstract
As audio deepfakes transition from research artifacts to widely available commercial tools, robust biometric authentication faces pressing security threats in high-stakes industries. This paper presents a systematic empirical evaluation of state-of-the-art speaker authentication systems based on a large-scale speech synthesis dataset, revealing two major security vulnerabilities: 1) modern voice cloning models trained on very small samples can easily bypass commercial speaker verification systems; and 2) anti-spoofing detectors struggle to generalize across different methods of audio synthesis, leading to a significant gap between in-domain performance and real-world robustness. These findings call for a reconsideration of security measures and stress the need for architectural innovations, adaptive defenses, and the transition towards multi-factor authentication.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · User Authentication and Security Systems · Digital Media Forensic Detection
