VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection

Hui Han; Shunli Wang; Yandan Zhao; Taiping Yao; Shouhong Ding

arXiv:2604.13660·cs.CV·April 20, 2026

VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection

Hui Han, Shunli Wang, Yandan Zhao, Taiping Yao, Shouhong Ding

PDF

TL;DR

This paper introduces VRAG-DFD, a novel framework combining retrieval-augmented generation and reinforcement learning to enhance deepfake detection by providing high-quality forgery knowledge and critical reasoning abilities to multimodal large language models.

Contribution

The paper proposes a new retrieval-augmented deepfake detection framework with datasets and training strategies to improve knowledge accuracy and reasoning in MLLMs.

Findings

01

Achieved state-of-the-art performance on deepfake detection benchmarks.

02

Constructed two datasets: FKD and F-CoT for knowledge annotation and reasoning.

03

Demonstrated improved generalization in deepfake detection tasks.

Abstract

In Deepfake Detection (DFD) tasks, researchers proposed two types of MLLM-based methods: complementary combination with small DFD detectors, or static forgery knowledge injection. The lack of professional forgery knowledge hinders the performance of these DFD-MLLMs. To solve this, we deeply considered two insightful issues: How to provide high-quality associated forgery knowledge for MLLMs? AND How to endow MLLMs with critical reasoning abilities given noisy reference information? Notably, we attempted to address above two questions with preliminary answers by leveraging the combination of Retrieval-Augmented Generation (RAG) and Reinforcement Learning (RL). Through RAG and RL techniques, we propose the VRAG-DFD framework with accurate dynamic forgery knowledge retrieval and powerful critical reasoning capabilities. Specifically, in terms of data, we constructed two datasets with RAG:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.