Securing Voice Biometrics: One-Shot Learning Approach for Audio Deepfake Detection
Awais Khan, Khalid Mahmood Malik

TL;DR
This paper introduces Quick-SpoofNet, a one-shot learning approach utilizing metric learning to detect both seen and unseen audio deepfake attacks in voice biometrics, demonstrating improved generalization over existing methods.
Contribution
The paper presents a novel one-shot learning method with metric learning for robust detection of synthetic speech attacks, including unseen types, in voice verification systems.
Findings
Effective detection of both seen and unseen deepfake attacks.
Good generalization to unseen bona fide speech data.
Outperforms existing models on ASVspoof datasets.
Abstract
The Automatic Speaker Verification (ASV) system is vulnerable to fraudulent activities using audio deepfakes, also known as logical-access voice spoofing attacks. These deepfakes pose a concerning threat to voice biometrics due to recent advancements in generative AI and speech synthesis technologies. While several deep learning models for speech synthesis detection have been developed, most of them show poor generalizability, especially when the attacks have different statistical distributions from the ones seen. Therefore, this paper presents Quick-SpoofNet, an approach for detecting both seen and unseen synthetic attacks in the ASV system using one-shot learning and metric learning techniques. By using the effective spectral feature set, the proposed method extracts compact and representative temporal embeddings from the voice samples and utilizes metric learning and triplet loss to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
