Combining EfficientNet and Vision Transformers for Video Deepfake   Detection

Davide Coccomini; Nicola Messina; Claudio Gennaro; Fabrizio Falchi

arXiv:2107.02612·cs.CV·June 29, 2022

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

Davide Coccomini, Nicola Messina, Claudio Gennaro, Fabrizio Falchi

PDF

3 Repos 2 Models

TL;DR

This paper explores combining EfficientNet and Vision Transformers for improved video deepfake detection, achieving results comparable to state-of-the-art methods without using distillation or ensembles.

Contribution

It introduces a novel combination of EfficientNet and Vision Transformers for deepfake detection, with a simple voting scheme for multiple faces, avoiding complex ensemble techniques.

Findings

01

Achieved an AUC of 0.951 on DFDC dataset.

02

F1 score of 88.0%, close to state-of-the-art.

03

Method is straightforward and does not rely on distillation or ensembles.

Abstract

Deepfakes are the result of digital manipulation to forge realistic yet fake imagery. With the astonishing advances in deep generative models, fake images or videos are nowadays obtained using variational autoencoders (VAEs) or Generative Adversarial Networks (GANs). These technologies are becoming more accessible and accurate, resulting in fake videos that are very difficult to be detected. Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7. In this study, we focus on video deep fake detection on faces, given that most methods are becoming extremely accurate in the generation of realistic human faces. Specifically, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor, obtaining comparable results with some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Pointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Batch Normalization · Sigmoid Activation · Dropout · Inverted Residual Block · 1x1 Convolution · Convolution