DinoLizer: Learning from the Best for Generative Inpainting Localization

Minh Thong Doi (IMT Nord Europe; CRIStAL); Jan Butora (CRIStAL); Vincent Itier (IMT Nord Europe; CRIStAL); J\'er\'emie Boulanger (CRIStAL); Patrick Bas (CRIStAL)

arXiv:2511.20722·cs.CV·November 27, 2025

DinoLizer: Learning from the Best for Generative Inpainting Localization

Minh Thong Doi (IMT Nord Europe, CRIStAL), Jan Butora (CRIStAL), Vincent Itier (IMT Nord Europe, CRIStAL), J\'er\'emie Boulanger (CRIStAL), Patrick Bas (CRIStAL)

PDF

Open Access

TL;DR

DinoLizer leverages a DINOv2-based model with a linear head and sliding-window approach to accurately localize manipulated regions in generative inpainting, outperforming existing methods and showing robustness to common post-processing techniques.

Contribution

The paper introduces DinoLizer, a novel DINOv2-based framework with a linear classification head and sliding-window strategy for improved manipulation localization in generative inpainting.

Findings

01

DinoLizer surpasses state-of-the-art in manipulation detection accuracy.

02

It remains robust under resizing, noise, and compression.

03

Achieves 12% higher IoU than previous models.

Abstract

We introduce DinoLizer, a DINOv2-based model for localizing manipulated regions in generative inpainting. Our method builds on a DINOv2 model pretrained to detect synthetic images on the B-Free dataset. We add a linear classification head on top of the Vision Transformer's patch embeddings to predict manipulations at a $14 \times 14$ patch resolution. The head is trained to focus on semantically altered regions, treating non-semantic edits as part of the original content. Because the ViT accepts only fixed-size inputs, we use a sliding-window strategy to aggregate predictions over larger images; the resulting heatmaps are post-processed to refine the estimated binary manipulation masks. Empirical results show that DinoLizer surpasses state-of-the-art local manipulation detectors on a range of inpainting datasets derived from different generative models. It remains robust to common…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Digital Media Forensic Detection