A Novel Unified Approach to Deepfake Detection
Lord Sen, Shyamapada Mukherjee

TL;DR
This paper introduces a new deepfake detection architecture combining spatial and frequency features with attention mechanisms, achieving state-of-the-art accuracy and strong cross-dataset generalization for images and videos.
Contribution
It presents a unified deepfake detection model utilizing cross attention and multiple feature domains, outperforming existing methods in accuracy and robustness.
Findings
Achieves 99.80% AUC on FF++ dataset
Achieves 99.88% AUC on Celeb-DF dataset
Demonstrates strong cross-dataset generalization
Abstract
The advancements in the field of AI is increasingly giving rise to various threats. One of the most prominent of them is the synthesis and misuse of Deepfakes. To sustain trust in this digital age, detection and tagging of deepfakes is very necessary. In this paper, a novel architecture for Deepfake detection in images and videos is presented. The architecture uses cross attention between spatial and frequency domain features along with a blood detection module to classify an image as real or fake. This paper aims to develop a unified architecture and provide insights into each step. Though this approach we achieve results better than SOTA, specifically 99.80%, 99.88% AUC on FF++ and Celeb-DF upon using Swin Transformer and BERT and 99.55, 99.38 while using EfficientNet-B4 and BERT. The approach also generalizes very well achieving great cross dataset results as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning
