SPAN: Spatial Pyramid Attention Network forImage Manipulation Localization
Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng, Yang, Ram Nevatia

TL;DR
SPAN introduces a multi-scale attention-based framework for detecting and localizing various image manipulations, leveraging a pyramid of local self-attention blocks with position encoding to improve accuracy.
Contribution
The paper proposes a novel Spatial Pyramid Attention Network (SPAN) that models multi-scale patch relationships for image manipulation localization, outperforming previous methods.
Findings
Significant performance improvements over state-of-the-art methods.
Effective modeling of multi-scale patch relationships.
Flexible training on synthetic data with fine-tuning capability.
Abstract
We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations. The proposed architecture efficiently and effectively models the relationship between image patches at multiple scales by constructing a pyramid of local self-attention blocks. The design includes a novel position projection to encode the spatial positions of the patches. SPAN is trained on a generic, synthetic dataset but can also be fine tuned for specific datasets; The proposed method shows significant gains in performance on standard datasets over previous state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
