Evolving from Single-modal to Multi-modal Facial Deepfake Detection: Progress and Challenges
Ping Liu, Qiqi Tao, Joey Tianyi Zhou

TL;DR
This survey reviews the evolution of deepfake detection from single-modal to multi-modal approaches, highlighting recent advancements, challenges, and the role of vision-language models in improving detection robustness.
Contribution
It provides the most comprehensive analysis to date of multi-modal deepfake detection techniques, datasets, and emerging research directions, surpassing prior surveys focused on earlier methods.
Findings
Transition from GAN-based to diffusion models increases detection difficulty.
Multi-modal approaches enhance robustness against sophisticated deepfakes.
Vision-Language Models improve detection accuracy and generalization.
Abstract
As synthetic media, including video, audio, and text, become increasingly indistinguishable from real content, the risks of misinformation, identity fraud, and social manipulation escalate. This survey traces the evolution of deepfake detection from early single-modal methods to sophisticated multi-modal approaches that integrate audio-visual and text-visual cues. We present a structured taxonomy of detection techniques and analyze the transition from GAN-based to diffusion model-driven deepfakes, which introduce new challenges due to their heightened realism and robustness against detection. Unlike prior surveys that primarily focus on single-modal detection or earlier deepfake techniques, this work provides the most comprehensive study to date, encompassing the latest advancements in multi-modal deepfake detection, generalization challenges, proactive defense mechanisms, and emerging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
