Deepfake Video Detection Using Convolutional Vision Transformer

Deressa Wodajo; Solomon Atnafu

arXiv:2102.11126·cs.CV·March 12, 2021·138 cites

Deepfake Video Detection Using Convolutional Vision Transformer

Deressa Wodajo, Solomon Atnafu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel Convolutional Vision Transformer model combining CNN and ViT for Deepfake detection, achieving high accuracy on the DFDC dataset and addressing the challenge of identifying realistic manipulated videos.

Contribution

It proposes a new CNN-augmented Vision Transformer architecture for Deepfake detection, demonstrating improved performance over existing methods on the DFDC dataset.

Findings

01

Achieved 91.5% accuracy on DFDC dataset

02

Attained an AUC of 0.91

03

Reduced loss to 0.32

Abstract

The rapid advancement of deep learning models that can generate and synthesis hyper-realistic videos known as Deepfakes and their ease of access to the general public have raised concern from all concerned bodies to their possible malicious intent use. Deep learning techniques can now generate faces, swap faces between two subjects in a video, alter facial expressions, change gender, and alter facial features, to list a few. These powerful video manipulation methods have potential use in many fields. However, they also pose a looming threat to everyone if used for harmful purposes such as identity theft, phishing, and scam. In this work, we propose a Convolutional Vision Transformer for the detection of Deepfakes. The Convolutional Vision Transformer has two components: Convolutional Neural Network (CNN) and Vision Transformer (ViT). The CNN extracts learnable features while the ViT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

erprogs/CViT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Layer Normalization · Attention Is All You Need · Dense Connections · Softmax · Adam