Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection
Yuchen Zhang, Yaxiong Wang, Kecheng Han, Yujiao Wu, Lianwei Wu, Li Zhu, Zhedong Zheng

TL;DR
This paper introduces REFORM, a reasoning-based framework for multimodal manipulation detection that emphasizes explicit forensic reasoning to improve generalization to unseen manipulation types.
Contribution
The paper proposes a novel three-stage curriculum and a large-scale dataset with reasoning annotations to enhance generalizable manipulation detection.
Findings
REFORM achieves state-of-the-art accuracy on multiple benchmarks.
The approach significantly improves generalization to unseen manipulation patterns.
Extensive experiments validate the effectiveness of explicit forensic reasoning.
Abstract
Recent advances in generative AI have significantly enhanced the realism of multimodal media manipulation, thereby posing substantial challenges to manipulation detection. Existing manipulation detection and grounding approaches predominantly focus on manipulation type classification under result-oriented supervision, which not only lacks interpretability but also tends to overfit superficial artifacts. In this paper, we argue that generalizable detection requires incorporating explicit forensic reasoning, rather than merely classifying a limited set of manipulation types, which fails to generalize to unseen manipulation patterns. To this end, we propose REFORM, a reasoning-driven framework that shifts learning from outcome fitting to process modeling. REFORM adopts a three-stage curriculum that first induces forensic rationales, then aligns reasoning with final judgments, and finally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Digital and Cyber Forensics · Generative Adversarial Networks and Image Synthesis
