Rapidly Adapting to New Voice Spoofing: Few-Shot Detection of Synthesized Speech Under Distribution Shifts

Ashi Garg; Zexin Cai; Henry Li Xinyuan; Leibny Paola Garc\'ia-Perera; Kevin Duh; Sanjeev Khudanpur; Matthew Wiesner; Nicholas Andrews

arXiv:2508.13320·eess.AS·August 20, 2025

Rapidly Adapting to New Voice Spoofing: Few-Shot Detection of Synthesized Speech Under Distribution Shifts

Ashi Garg, Zexin Cai, Henry Li Xinyuan, Leibny Paola Garc\'ia-Perera, Kevin Duh, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews

PDF

TL;DR

This paper introduces a self-attentive prototypical network for few-shot detection of synthesized speech, enabling rapid adaptation to unseen speech synthesis methods, speakers, and languages under distribution shifts, with significant performance improvements.

Contribution

The paper proposes a novel self-attentive prototypical network that enhances few-shot adaptation for voice spoofing detection under distribution shifts, outperforming traditional zero-shot detectors.

Findings

01

Achieves up to 32% relative EER reduction on deepfakes in Japanese.

02

Achieves up to 20% relative EER reduction on ASVspoof 2021 Deepfake dataset.

03

Effective with as few as 10 in-distribution samples.

Abstract

We address the challenge of detecting synthesized speech under distribution shifts -- arising from unseen synthesis methods, speakers, languages, or audio conditions -- relative to the training data. Few-shot learning methods are a promising way to tackle distribution shifts by rapidly adapting on the basis of a few in-distribution samples. We propose a self-attentive prototypical network to enable more robust few-shot adaptation. To evaluate our approach, we systematically compare the performance of traditional zero-shot detectors and the proposed few-shot detectors, carefully controlling training conditions to introduce distribution shifts at evaluation time. In conditions where distribution shifts hamper the zero-shot performance, our proposed few-shot adaptation technique can quickly adapt using as few as 10 in-distribution samples -- achieving upto 32% relative EER reduction on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.