RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection

Ruibo Fu; Xiaopeng Wang; Zhengqi Wen; Jianhua Tao; Yuankun Xie; Zhiyong Wang; Chunyu Qiang; Xuefei Liu; Cunhang Fan; Chenxing Li; Guanjun Li

arXiv:2506.00375·cs.SD·June 3, 2025

RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection

Ruibo Fu, Xiaopeng Wang, Zhengqi Wen, Jianhua Tao, Yuankun Xie, Zhiyong Wang, Chunyu Qiang, Xuefei Liu, Cunhang Fan, Chenxing Li, Guanjun Li

PDF

Open Access

TL;DR

This paper introduces RPRA-ADD, a novel audio deepfake detection framework that enhances forgery trace perception and generalization across diverse datasets and attack types, achieving state-of-the-art results.

Contribution

The paper proposes a new integrated framework with a global-local perception module, dispersal loss, and attention mechanism to improve deepfake detection robustness and generalization.

Findings

01

Achieves over 20% performance improvement on benchmark datasets.

02

Outperforms existing methods in cross-domain evaluations.

03

Demonstrates enhanced attention to forgery traces and generalization capability.

Abstract

Existing methods for deepfake audio detection have demonstrated some effectiveness. However, they still face challenges in generalizing to new forgery techniques and evolving attack patterns. This limitation mainly arises because the models rely heavily on the distribution of the training data and fail to learn a decision boundary that captures the essential characteristics of forgeries. Additionally, relying solely on a classification loss makes it difficult to capture the intrinsic differences between real and fake audio. In this paper, we propose the RPRA-ADD, an integrated Reconstruction-Perception-Reinforcement-Attention networks based forgery trace enhancement-driven robust audio deepfake detection framework. First, we propose a Global-Local Forgery Perception (GLFP) module for enhancing the acoustic perception capacity of forgery traces. To significantly reinforce the feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Speech Recognition and Synthesis · Music and Audio Processing