Vulnerabilities of Audio-Based Biometric Authentication Systems Against Deepfake Speech Synthesis

Mengze Hong; Di Jiang; Zeying Xie; Weiwei Zhao; Guan Wang; Chen Jason Zhang

arXiv:2601.02914·cs.SD·January 7, 2026

Vulnerabilities of Audio-Based Biometric Authentication Systems Against Deepfake Speech Synthesis

Mengze Hong, Di Jiang, Zeying Xie, Weiwei Zhao, Guan Wang, Chen Jason Zhang

PDF

Open Access

TL;DR

This paper empirically evaluates the security vulnerabilities of audio-based biometric systems against deepfake speech, revealing that current models are easily fooled and anti-spoofing detectors lack robustness, highlighting the need for improved defenses.

Contribution

It provides a large-scale empirical assessment of speaker verification vulnerabilities to deepfake speech, emphasizing the challenges in current anti-spoofing methods and proposing directions for future security enhancements.

Findings

01

Small sample-trained voice cloning can bypass verification

02

Anti-spoofing detectors lack cross-method robustness

03

Significant gap between in-domain and real-world performance

Abstract

As audio deepfakes transition from research artifacts to widely available commercial tools, robust biometric authentication faces pressing security threats in high-stakes industries. This paper presents a systematic empirical evaluation of state-of-the-art speaker authentication systems based on a large-scale speech synthesis dataset, revealing two major security vulnerabilities: 1) modern voice cloning models trained on very small samples can easily bypass commercial speaker verification systems; and 2) anti-spoofing detectors struggle to generalize across different methods of audio synthesis, leading to a significant gap between in-domain performance and real-world robustness. These findings call for a reconsideration of security measures and stress the need for architectural innovations, adaptive defenses, and the transition towards multi-factor authentication.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · User Authentication and Security Systems · Digital Media Forensic Detection