Beyond Semantics: Uncovering the Physics of Fakes via Universal Physical Descriptors for Cross-Modal Synthetic Detection
Mei Qiu, Jianqiang Zhao, Yanyun Qu

TL;DR
This paper identifies stable physical features that distinguish AI-generated images from real ones across diverse datasets and integrates them into multimodal models like CLIP to improve synthetic image detection.
Contribution
It introduces a novel feature selection algorithm for robust physical features and demonstrates their integration into CLIP for state-of-the-art detection performance.
Findings
Identified five core physical features with consistent discriminative power.
Achieved 99.8% accuracy on multiple Genimage benchmarks.
Enhanced detection by combining pixel-level features with semantic information.
Abstract
The rapid advancement of AI generated content (AIGC) has blurred the boundaries between real and synthetic images, exposing the limitations of existing deepfake detectors that often overfit to specific generative models. This adaptability crisis calls for a fundamental reexamination of the intrinsic physical characteristics that distinguish natural from AI-generated images. In this paper, we address two critical research questions: (1) What physical features can stably and robustly discriminate AI generated images across diverse datasets and generative architectures? (2) Can these objective pixel-level features be integrated into multimodal models like CLIP to enhance detection performance while mitigating the unreliability of language-based information? To answer these questions, we conduct a comprehensive exploration of 15 physical features across more than 20 datasets generated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
