M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization

Ju-Hyeon Nam; Dong-Hyun Moon; Sang-Chul Lee

arXiv:2506.20922·cs.CV·June 27, 2025

M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization

Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee

PDF

Open Access

TL;DR

M2SFormer is a Transformer-based framework that integrates multi-spectral and multi-scale attention with edge-aware guidance to improve pixel-level image forgery localization, especially for subtle manipulations.

Contribution

It introduces a unified multi-frequency and multi-scale attention mechanism combined with a difficulty-guided module to enhance forgery detection accuracy.

Findings

01

Outperforms state-of-the-art models on benchmark datasets

02

Demonstrates superior generalization to unseen domains

03

Effectively captures subtle forgery artifacts

Abstract

Image editing techniques have rapidly advanced, facilitating both innovative use cases and malicious manipulation of digital images. Deep learning-based methods have recently achieved high accuracy in pixel-level forgery localization, yet they frequently struggle with computational overhead and limited representation power, particularly for subtle or complex tampering. In this paper, we propose M2SFormer, a novel Transformer encoder-based framework designed to overcome these challenges. Unlike approaches that process spatial and frequency cues separately, M2SFormer unifies multi-frequency and multi-scale attentions in the skip connection, harnessing global context to better capture diverse forgery artifacts. Additionally, our framework addresses the loss of fine detail during upsampling by utilizing a global prior map, a curvature metric indicating the difficulty of forgery…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Image Processing Techniques and Applications · Adversarial Robustness in Machine Learning

MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer