Post-training for Deepfake Speech Detection

Wanying Ge; Xin Wang; Xuechen Liu; Junichi Yamagishi

arXiv:2506.21090·eess.AS·October 22, 2025

Post-training for Deepfake Speech Detection

Wanying Ge, Xin Wang, Xuechen Liu, Junichi Yamagishi

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper presents a post-training method that adapts SSL models for deepfake speech detection, significantly improving robustness and generalization across multiple languages and datasets.

Contribution

It introduces AntiDeepfake models, a novel post-training approach that enhances SSL models for deepfake detection using large-scale multilingual data.

Findings

01

Post-trained models show strong robustness to unseen deepfake speech.

02

Further fine-tuning surpasses state-of-the-art detectors on Deepfake-Eval-2024.

03

Models generalize well across over one hundred languages.

Abstract

We introduce a post-training approach that adapts self-supervised learning (SSL) models for deepfake speech detection by bridging the gap between general pre-training and domain-specific fine-tuning. We present AntiDeepfake models, a series of post-trained models developed using a large-scale multilingual speech dataset containing over 56,000 hours of genuine speech and 18,000 hours of speech with various artifacts in over one hundred languages. Experimental results show that the post-trained models already exhibit strong robustness and generalization to unseen deepfake speech. When they are further fine-tuned on the Deepfake-Eval-2024 dataset, these models consistently surpass existing state-of-the-art detectors that do not leverage post-training. Model checkpoints and source code are available online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nii-yamagishilab/antideepfake
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing