AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures

Sifatullah Sheikh Urmi; Kirtonia Nuzath Tabassum Arthi; and Md Al-Imran

arXiv:2601.01281·cs.CV·January 6, 2026

AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures

Sifatullah Sheikh Urmi, Kirtonia Nuzath Tabassum Arthi, and Md Al-Imran

PDF

Open Access

TL;DR

This paper evaluates four AI-based models, including CNNs and a Vision Transformer, for deepfake detection, demonstrating that data augmentation and model choice significantly impact accuracy and efficiency.

Contribution

It introduces a comparative analysis of CNN and Vision Transformer architectures for deepfake detection, highlighting the effectiveness of data augmentation techniques.

Findings

01

VFDNET with MobileNetV3 achieved highest accuracy

02

Data preprocessing improved model performance

03

Vision Transformer showed promising results in detection accuracy

Abstract

The increasing use of artificial intelligence generated deepfakes creates major challenges in maintaining digital authenticity. Four AI-based models, consisting of three CNNs and one Vision Transformer, were evaluated using large face image datasets. Data preprocessing and augmentation techniques improved model performance across different scenarios. VFDNET demonstrated superior accuracy with MobileNetV3, showing efficient performance, thereby demonstrating AI's capabilities for dependable deepfake detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications