Edge-Enhanced Vision Transformer Framework for Accurate AI-Generated Image Detection

Dabbrata Das; Mahshar Yahan; Md Tareq Zaman; and Md Rishadul Bayesh

arXiv:2508.17877·cs.CV·August 26, 2025

Edge-Enhanced Vision Transformer Framework for Accurate AI-Generated Image Detection

Dabbrata Das, Mahshar Yahan, Md Tareq Zaman, and Md Rishadul Bayesh

PDF

TL;DR

This paper introduces a hybrid detection framework combining a fine-tuned Vision Transformer with an edge-based image processing module to improve the accuracy and efficiency of AI-generated image detection.

Contribution

It proposes a novel edge-based module integrated with ViT, enhancing structural cue sensitivity for more accurate AI-generated image detection.

Findings

01

Achieves 97.75% accuracy on CIFAKE dataset

02

Outperforms state-of-the-art models in detection performance

03

Demonstrates effectiveness on both images and video frames

Abstract

The rapid advancement of generative models has led to a growing prevalence of highly realistic AI-generated images, posing significant challenges for digital forensics and content authentication. Conventional detection methods mainly rely on deep learning models that extract global features, which often overlook subtle structural inconsistencies and demand substantial computational resources. To address these limitations, we propose a hybrid detection framework that combines a fine-tuned Vision Transformer (ViT) with a novel edge-based image processing module. The edge-based module computes variance from edge-difference maps generated before and after smoothing, exploiting the observation that AI-generated images typically exhibit smoother textures, weaker edges, and reduced noise compared to real images. When applied as a post-processing step on ViT predictions, this module enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.