Evolving from Single-modal to Multi-modal Facial Deepfake Detection:   Progress and Challenges

Ping Liu; Qiqi Tao; Joey Tianyi Zhou

arXiv:2406.06965·cs.CV·April 4, 2025·3 cites

Evolving from Single-modal to Multi-modal Facial Deepfake Detection: Progress and Challenges

Ping Liu, Qiqi Tao, Joey Tianyi Zhou

PDF

Open Access 2 Repos

TL;DR

This survey reviews the evolution of deepfake detection from single-modal to multi-modal approaches, highlighting recent advancements, challenges, and the role of vision-language models in improving detection robustness.

Contribution

It provides the most comprehensive analysis to date of multi-modal deepfake detection techniques, datasets, and emerging research directions, surpassing prior surveys focused on earlier methods.

Findings

01

Transition from GAN-based to diffusion models increases detection difficulty.

02

Multi-modal approaches enhance robustness against sophisticated deepfakes.

03

Vision-Language Models improve detection accuracy and generalization.

Abstract

As synthetic media, including video, audio, and text, become increasingly indistinguishable from real content, the risks of misinformation, identity fraud, and social manipulation escalate. This survey traces the evolution of deepfake detection from early single-modal methods to sophisticated multi-modal approaches that integrate audio-visual and text-visual cues. We present a structured taxonomy of detection techniques and analyze the transition from GAN-based to diffusion model-driven deepfakes, which introduce new challenges due to their heightened realism and robustness against detection. Unlike prior surveys that primarily focus on single-modal detection or earlier deepfake techniques, this work provides the most comprehensive study to date, encompassing the latest advancements in multi-modal deepfake detection, generalization challenges, proactive defense mechanisms, and emerging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion