AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

Ziyin Zhou; Yunpeng Luo; Yuanchen Wu; Ke Sun; Jiayi Ji; Ke Yan; Shouhong Ding; Xiaoshuai Sun; Yunsheng Wu; Rongrong Ji

arXiv:2507.02664·cs.CV·July 8, 2025

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

Ziyin Zhou, Yunpeng Luo, Yuanchen Wu, Ke Sun, Jiayi Ji, Ke Yan, Shouhong Ding, Xiaoshuai Sun, Yunsheng Wu, Rongrong Ji

PDF

Open Access

TL;DR

This paper presents AIGI-Holmes, a multimodal large language model-based system for detecting AI-generated images, providing explainability and improved generalization through a new dataset, a novel training pipeline, and collaborative inference strategies.

Contribution

The work introduces a comprehensive dataset, a structured annotation method, and a three-stage training framework to enhance explainability and generalization in AI-generated image detection.

Findings

01

AIGI-Holmes outperforms existing methods on three benchmarks.

02

The collaborative decoding strategy improves detection accuracy.

03

The dataset and training pipeline facilitate human-verifiable explanations.

Abstract

The rapid development of AI-generated content (AIGC) technology has led to the misuse of highly realistic AI-generated images (AIGI) in spreading misinformation, posing a threat to public information security. Although existing AIGI detection techniques are generally effective, they face two issues: 1) a lack of human-verifiable explanations, and 2) a lack of generalization in the latest generation technology. To address these issues, we introduce a large-scale and comprehensive dataset, Holmes-Set, which includes the Holmes-SFTSet, an instruction-tuning dataset with explanations on whether images are AI-generated, and the Holmes-DPOSet, a human-aligned preference dataset. Our work introduces an efficient data annotation method called the Multi-Expert Jury, enhancing data generation through structured MLLM explanations and quality control via cross-model evaluation, expert defect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Radiomics and Machine Learning in Medical Imaging