Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art
Md Ashik Khan, Arafat Alam Jion

TL;DR
This paper introduces a fixed-threshold evaluation protocol for AI-generated image detectors, revealing true robustness across transformations and guiding deployment choices between CNNs, ViTs, and hybrids.
Contribution
It proposes a fixed-threshold evaluation method that prevents inflated robustness estimates and systematically compares CNN, ViT, and hybrid models under various post-processing conditions.
Findings
Frequency-aided CNNs excel on pristine photos but collapse under compression.
ViTs maintain high performance across transformations, showing robustness.
Hybrid models provide balanced performance across domains.
Abstract
AI image generators create both photorealistic images and stylized art, necessitating robust detectors that maintain performance under common post-processing transformations (JPEG compression, blur, downscaling). Existing methods optimize single metrics without addressing deployment-critical factors such as operating point selection and fixed-threshold robustness. This work addresses misleading robustness estimates by introducing a fixed-threshold evaluation protocol that holds decision thresholds, selected once on clean validation data, fixed across all post-processing transformations. Traditional methods retune thresholds per condition, artificially inflating robustness estimates and masking deployment failures. We report deployment-relevant performance at three operating points (Low-FPR, ROC-optimal, Best-F1) under systematic degradation testing using a lightweight CNN-ViT hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Aesthetic Perception and Analysis
