Off-the-shelf Vision Models Benefit Image Manipulation Localization
Zhengxuan Zhang, Keji Song, Junmin Hu, Ao Luo, Yuezun Li

TL;DR
This paper introduces ReVi, a trainable adapter that leverages off-the-shelf general vision models for image manipulation localization, enabling effective detection without extensive retraining.
Contribution
It presents a novel adapter that repurposes existing vision models for IML, bridging the gap between manipulation detection and semantic understanding.
Findings
ReVi outperforms existing IML methods in experiments.
The adapter effectively disentangles manipulation-specific features.
The approach requires only fine-tuning of the adapter, not full model retraining.
Abstract
Image manipulation localization (IML) and general vision tasks are typically treated as two separate research directions due to the fundamental differences between manipulation-specific and semantic features. In this paper, however, we bridge this gap by introducing a fresh perspective: these two directions are intrinsically connected, and general semantic priors can benefit IML. Building on this insight, we propose a novel trainable adapter (named ReVi) that repurposes existing off-the-shelf general-purpose vision models (e.g., image generation and segmentation networks) for IML. Inspired by robust principal component analysis, the adapter disentangles semantic redundancy from manipulation-specific information embedded in these models and selectively enhances the latter. Unlike existing IML methods that require extensive model redesign and full retraining, our method relies on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
