Diffusion Reconstruction towards Generalizable Audio Deepfake Detection

Bo Cheng; Songjun Cao; Xiaoming Zhang; Jie Chen; Long Ma; Fei Chen

arXiv:2604.26465·cs.SD·April 30, 2026

Diffusion Reconstruction towards Generalizable Audio Deepfake Detection

Bo Cheng, Songjun Cao, Xiaoming Zhang, Jie Chen, Long Ma, Fei Chen

PDF

TL;DR

This paper introduces a diffusion-based hard sample generation framework with contrastive learning to improve generalization in Audio Deepfake Detection, effectively handling unseen attacks.

Contribution

It proposes a novel diffusion reconstruction method combined with RACL to enhance model robustness against unseen audio deepfake attacks.

Findings

01

Significant reduction in average EER compared to baseline.

02

Diffusion-based hard sample generation outperforms other reconstruction paradigms.

03

Enhanced generalization demonstrated through extensive experiments.

Abstract

Achieving robust generalization against unseen attacks remains a challenge in Audio Deepfake Detection (ADD), driven by the rapid evolution of generative models. To address this, we propose a framework centered on hard sample classification. The core idea is that a model capable of distinguishing challenging hard samples is inherently equipped to handle simpler cases effectively. We investigate multiple reconstruction paradigms, identifying the diffusion-based method as optimal for generating hard samples. Furthermore, we leverage multi-layer feature aggregation and introduce a Regularization-Assisted Contrastive Learning (RACL) objective to enhance generalizability. Experiments demonstrate the superior generalization of our approach, with our best model achieving a significant reduction in the average Equal Error Rate (EER) compared to the baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.