ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation

Zitong Xu; Huiyu Duan; Xiaoyu Wang; Zhaolin Cai; Kaiwei Zhang; Qiang Hu; Jing Liu; Xiongkuo Min; Guangtao Zhai

arXiv:2511.14259·cs.CV·January 21, 2026

ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation

Zitong Xu, Huiyu Duan, Xiaoyu Wang, Zhaolin Cai, Kaiwei Zhang, Qiang Hu, Jing Liu, Xiongkuo Min, Guangtao Zhai

PDF

Open Access

TL;DR

ManipShield is a comprehensive framework utilizing a large-scale benchmark and multimodal large language models to improve detection, localization, and explanation of diverse, AI-generated image manipulations, enhancing generalization and interpretability.

Contribution

The paper introduces ManipBench, a large-scale benchmark for diverse image manipulations, and ManipShield, a unified model that achieves state-of-the-art detection, localization, and explanation performance.

Findings

01

ManipShield outperforms existing methods on ManipBench and public datasets.

02

ManipShield generalizes well to unseen manipulation models.

03

ManipBench provides extensive annotated data for interpretability.

Abstract

With the rapid advancement of generative models, powerful image editing methods now enable diverse and highly realistic image manipulations that far surpass traditional deepfake techniques, posing new challenges for manipulation detection. Existing image manipulation detection and localization (IMDL) benchmarks suffer from limited content diversity, narrow generative-model coverage, and insufficient interpretability, which hinders the generalization and explanation capabilities of current manipulation detection methods. To address these limitations, we introduce \textbf{ManipBench}, a large-scale benchmark for image manipulation detection and localization focusing on AI-edited images. ManipBench contains over 450K manipulated images produced by 25 state-of-the-art image editing models across 12 manipulation categories, among which 100K images are further annotated with bounding boxes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Cell Image Analysis Techniques