Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images
Zeyu Lu, Di Huang, Lei Bai, Jingjing Qu, Chengyue Wu, Xihui Liu, Wanli, Ouyang

TL;DR
This paper benchmarks human and AI capabilities in distinguishing real photos from AI-generated images, revealing humans struggle significantly while top AI detectors perform better, highlighting the need for improved fake image detection.
Contribution
It introduces a large-scale fake image dataset and benchmarks for human and AI detection capabilities, providing insights into current limitations and performance gaps.
Findings
Humans have a 38.7% misclassification rate in distinguishing real from AI-generated images.
Top AI detection models have a 13% failure rate under the same evaluation conditions.
The study raises awareness of risks associated with AI-generated fake images.
Abstract
Photos serve as a way for humans to record what they experience in their daily lives, and they are often regarded as trustworthy sources of information. However, there is a growing concern that the advancement of artificial intelligence (AI) technology may produce fake photos, which can create confusion and diminish trust in photographs. This study aims to comprehensively evaluate agents for distinguishing state-of-the-art AI-generated visual content. Our study benchmarks both human capability and cutting-edge fake image detection AI algorithms, using a newly collected large-scale fake image dataset Fake2M. In our human perception evaluation, titled HPBench, we discovered that humans struggle significantly to distinguish real photos from AI-generated ones, with a misclassification rate of 38.7%. Along with this, we conduct the model capability of AI-Generated images detection evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMisinformation and Its Impacts · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
