TL;DR
This paper explores combining EfficientNet and Vision Transformers for improved video deepfake detection, achieving results comparable to state-of-the-art methods without using distillation or ensembles.
Contribution
It introduces a novel combination of EfficientNet and Vision Transformers for deepfake detection, with a simple voting scheme for multiple faces, avoiding complex ensemble techniques.
Findings
Achieved an AUC of 0.951 on DFDC dataset.
F1 score of 88.0%, close to state-of-the-art.
Method is straightforward and does not rely on distillation or ensembles.
Abstract
Deepfakes are the result of digital manipulation to forge realistic yet fake imagery. With the astonishing advances in deep generative models, fake images or videos are nowadays obtained using variational autoencoders (VAEs) or Generative Adversarial Networks (GANs). These technologies are becoming more accessible and accurate, resulting in fake videos that are very difficult to be detected. Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7. In this study, we focus on video deep fake detection on faces, given that most methods are becoming extremely accurate in the generation of realistic human faces. Specifically, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor, obtaining comparable results with some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- davide-coccomini/Combining-EfficientNet-and-Vision-Transformersfor-Video-Deepfake-DetectionpytorchOfficial
- davide-coccomini/Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-DetectionpytorchOfficial
- davide-coccomini/mintime-multi-identity-size-invariant-timesformer-for-video-deepfake-detectionpytorch
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Pointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Batch Normalization · Sigmoid Activation · Dropout · Inverted Residual Block · 1x1 Convolution · Convolution
