Post-training for Deepfake Speech Detection
Wanying Ge, Xin Wang, Xuechen Liu, Junichi Yamagishi

TL;DR
This paper presents a post-training method that adapts SSL models for deepfake speech detection, significantly improving robustness and generalization across multiple languages and datasets.
Contribution
It introduces AntiDeepfake models, a novel post-training approach that enhances SSL models for deepfake detection using large-scale multilingual data.
Findings
Post-trained models show strong robustness to unseen deepfake speech.
Further fine-tuning surpasses state-of-the-art detectors on Deepfake-Eval-2024.
Models generalize well across over one hundred languages.
Abstract
We introduce a post-training approach that adapts self-supervised learning (SSL) models for deepfake speech detection by bridging the gap between general pre-training and domain-specific fine-tuning. We present AntiDeepfake models, a series of post-trained models developed using a large-scale multilingual speech dataset containing over 56,000 hours of genuine speech and 18,000 hours of speech with various artifacts in over one hundred languages. Experimental results show that the post-trained models already exhibit strong robustness and generalization to unseen deepfake speech. When they are further fine-tuned on the Deepfake-Eval-2024 dataset, these models consistently surpass existing state-of-the-art detectors that do not leverage post-training. Model checkpoints and source code are available online.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nii-yamagishilab/mms-300m-anti-deepfakemodel· 111 dl111 dl
- 🤗nii-yamagishilab/mms-1b-anti-deepfakemodel· 22 dl· ♡ 122 dl♡ 1
- 🤗nii-yamagishilab/xls-r-1b-anti-deepfakemodel· 14 dl14 dl
- 🤗nii-yamagishilab/wav2vec-large-anti-deepfakemodel· 78 dl· ♡ 278 dl♡ 2
- 🤗nii-yamagishilab/xls-r-2b-anti-deepfakemodel· 46 dl· ♡ 446 dl♡ 4
- 🤗nii-yamagishilab/wav2vec-small-anti-deepfakemodel· 88 dl88 dl
- 🤗nii-yamagishilab/hubert-xlarge-anti-deepfakemodel· 6 dl· ♡ 16 dl♡ 1
- 🤗nii-yamagishilab/hubert-xlarge-anti-deepfake-ndamodel
- 🤗nii-yamagishilab/wav2vec-small-anti-deepfake-ndamodel· 36 dl36 dl
- 🤗nii-yamagishilab/wav2vec-large-anti-deepfake-ndamodel· 772 dl772 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
