Towards Generalized Image Manipulation Localization via Score-based Model

Yunfei Wang; Bo Du; Zhe Yang; Xin Liu; Zhiyu Lin; Tianxin Xu; and Ji-Zhe Zhou

arXiv:2605.16879·cs.CV·May 19, 2026

Towards Generalized Image Manipulation Localization via Score-based Model

Yunfei Wang, Bo Du, Zhe Yang, Xin Liu, Zhiyu Lin, Tianxin Xu, and Ji-Zhe Zhou

PDF

TL;DR

This paper introduces DiffIML, a score-based generative modeling framework for image manipulation localization that improves generalization to unseen manipulation types by capturing intrinsic mask distribution geometry.

Contribution

DiffIML leverages score-based generative models with structural priors to enhance generalization in image manipulation localization, overcoming overfitting issues of discriminative methods.

Findings

01

DiffIML outperforms state-of-the-art methods on multiple benchmarks.

02

It demonstrates significant generalization to unseen datasets.

03

The framework is efficient and stable with proposed architectural improvements.

Abstract

With the rapid evolution of synthetic media, Image Manipulation Localization (IML) has emerged as a critical component in multimedia forensics for ensuring the integrity of digital content. However, generalization remains a core challenge, as existing discriminative methods typically learn a fixed decision boundary that tends to overfit to specific training artifacts and fails to adapt to unseen manipulation types. To address this, we propose DiffIML, a novel framework that introduces score-based generative modeling to IML. Diverging from the direct estimation of hard boundaries, DiffIML approximates the score function, the gradient of the log-likelihood, to capture the intrinsic geometric topology of mask distributions. This paradigm leverages structural priors to iteratively recover coherent masks from noise, thereby circumventing the brittleness associated with discriminative models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.