Bridging the Micro--Macro Gap: Frequency-Aware Semantic Alignment for Image Manipulation Localization
Xiaojie Liang, Zhimin Chen, Ziqi Sheng, Wei Lu

TL;DR
This paper introduces FASA, a unified framework that effectively localizes both traditional and diffusion-generated image manipulations by leveraging frequency cues and semantic priors, achieving state-of-the-art results.
Contribution
FASA is the first unified approach combining frequency-aware cues and semantic priors for comprehensive image manipulation localization.
Findings
Achieves state-of-the-art performance on OpenSDI and traditional benchmarks.
Demonstrates strong generalization across different manipulation generators and datasets.
Maintains robustness under common image degradations.
Abstract
As generative image editing advances, image manipulation localization (IML) must handle both traditional manipulations with conspicuous forensic artifacts and diffusion-generated edits that appear locally realistic. Existing methods typically rely on either low-level forensic cues or high-level semantics alone, leading to a fundamental micro--macro gap. To bridge this gap, we propose FASA, a unified framework for localizing both traditional and diffusion-generated manipulations. Specifically, we extract manipulation-sensitive frequency cues through an adaptive dual-band DCT module and learn manipulation-aware semantic priors via patch-level contrastive alignment on frozen CLIP representations. We then inject these priors into a hierarchical frequency pathway through a semantic-frequency side adapter for multi-scale feature interaction, and employ a prototype-guided, frequency-gated mask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
