DinoLizer: Learning from the Best for Generative Inpainting Localization
Minh Thong Doi (IMT Nord Europe, CRIStAL), Jan Butora (CRIStAL), Vincent Itier (IMT Nord Europe, CRIStAL), J\'er\'emie Boulanger (CRIStAL), Patrick Bas (CRIStAL)

TL;DR
DinoLizer leverages a DINOv2-based model with a linear head and sliding-window approach to accurately localize manipulated regions in generative inpainting, outperforming existing methods and showing robustness to common post-processing techniques.
Contribution
The paper introduces DinoLizer, a novel DINOv2-based framework with a linear classification head and sliding-window strategy for improved manipulation localization in generative inpainting.
Findings
DinoLizer surpasses state-of-the-art in manipulation detection accuracy.
It remains robust under resizing, noise, and compression.
Achieves 12% higher IoU than previous models.
Abstract
We introduce DinoLizer, a DINOv2-based model for localizing manipulated regions in generative inpainting. Our method builds on a DINOv2 model pretrained to detect synthetic images on the B-Free dataset. We add a linear classification head on top of the Vision Transformer's patch embeddings to predict manipulations at a patch resolution. The head is trained to focus on semantically altered regions, treating non-semantic edits as part of the original content. Because the ViT accepts only fixed-size inputs, we use a sliding-window strategy to aggregate predictions over larger images; the resulting heatmaps are post-processed to refine the estimated binary manipulation masks. Empirical results show that DinoLizer surpasses state-of-the-art local manipulation detectors on a range of inpainting datasets derived from different generative models. It remains robust to common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Digital Media Forensic Detection
