A Timely Survey on Vision Transformer for Deepfake Detection

Zhikan Wang; Zhongyao Cheng; Jiajie Xiong; Xun Xu; Tianrui Li,; Bharadwaj Veeravalli; Xulei Yang

arXiv:2405.08463·cs.CV·May 15, 2024

A Timely Survey on Vision Transformer for Deepfake Detection

Zhikan Wang, Zhongyao Cheng, Jiajie Xiong, Xun Xu, Tianrui Li,, Bharadwaj Veeravalli, Xulei Yang

PDF

Open Access

TL;DR

This survey reviews Vision Transformer-based methods for deepfake detection, highlighting their architectures, performance, and future research directions to address the growing challenges of deepfake technology.

Contribution

It provides a comprehensive overview and categorization of ViT-based deepfake detection models, analyzing their structures and outlining future research pathways.

Findings

01

ViT-based models show superior performance in deepfake detection

02

Categorization into standalone, sequential, and parallel architectures

03

Guidance for future research in ViT applications for deepfake detection

Abstract

In recent years, the rapid advancement of deepfake technology has revolutionized content creation, lowering forgery costs while elevating quality. However, this progress brings forth pressing concerns such as infringements on individual rights, national security threats, and risks to public safety. To counter these challenges, various detection methodologies have emerged, with Vision Transformer (ViT)-based approaches showcasing superior performance in generality and efficiency. This survey presents a timely overview of ViT-based deepfake detection models, categorized into standalone, sequential, and parallel architectures. Furthermore, it succinctly delineates the structure and characteristics of each model. By analyzing existing research and addressing future directions, this survey aims to equip researchers with a nuanced understanding of ViT's pivotal role in deepfake detection,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Currency Recognition and Detection · Digital Media Forensic Detection

MethodsPosition-Wise Feed-Forward Layer · Dropout · Label Smoothing · Absolute Position Encodings · Byte Pair Encoding · Adam · Softmax · Attention Is All You Need · Layer Normalization · Linear Layer