Deepfake Video Detection Using Convolutional Vision Transformer
Deressa Wodajo, Solomon Atnafu

TL;DR
This paper introduces a novel Convolutional Vision Transformer model combining CNN and ViT for Deepfake detection, achieving high accuracy on the DFDC dataset and addressing the challenge of identifying realistic manipulated videos.
Contribution
It proposes a new CNN-augmented Vision Transformer architecture for Deepfake detection, demonstrating improved performance over existing methods on the DFDC dataset.
Findings
Achieved 91.5% accuracy on DFDC dataset
Attained an AUC of 0.91
Reduced loss to 0.32
Abstract
The rapid advancement of deep learning models that can generate and synthesis hyper-realistic videos known as Deepfakes and their ease of access to the general public have raised concern from all concerned bodies to their possible malicious intent use. Deep learning techniques can now generate faces, swap faces between two subjects in a video, alter facial expressions, change gender, and alter facial features, to list a few. These powerful video manipulation methods have potential use in many fields. However, they also pose a looming threat to everyone if used for harmful purposes such as identity theft, phishing, and scam. In this work, we propose a Convolutional Vision Transformer for the detection of Deepfakes. The Convolutional Vision Transformer has two components: Convolutional Neural Network (CNN) and Vision Transformer (ViT). The CNN extracts learnable features while the ViT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Layer Normalization · Attention Is All You Need · Dense Connections · Softmax · Adam
