SPAN: Spatial Pyramid Attention Network forImage Manipulation   Localization

Xuefeng Hu; Zhihan Zhang; Zhenye Jiang; Syomantak Chaudhuri; Zhenheng; Yang; Ram Nevatia

arXiv:2009.00726·cs.CV·January 15, 2021

SPAN: Spatial Pyramid Attention Network forImage Manipulation Localization

Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng, Yang, Ram Nevatia

PDF

TL;DR

SPAN introduces a multi-scale attention-based framework for detecting and localizing various image manipulations, leveraging a pyramid of local self-attention blocks with position encoding to improve accuracy.

Contribution

The paper proposes a novel Spatial Pyramid Attention Network (SPAN) that models multi-scale patch relationships for image manipulation localization, outperforming previous methods.

Findings

01

Significant performance improvements over state-of-the-art methods.

02

Effective modeling of multi-scale patch relationships.

03

Flexible training on synthetic data with fine-tuning capability.

Abstract

We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations. The proposed architecture efficiently and effectively models the relationship between image patches at multiple scales by constructing a pyramid of local self-attention blocks. The design includes a novel position projection to encode the spatial positions of the patches. SPAN is trained on a generic, synthetic dataset but can also be fine tuned for specific datasets; The proposed method shows significant gains in performance on standard datasets over previous state-of-the-art methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.