Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection
Shuchang Zhou, Shangkun Wu, Jiwei Wei, Ke Liu, Ran Ran, Caiyan Qin, Yang Yang

TL;DR
This paper introduces FGINet, a novel method combining frequency-aware encoding and gated injection to enhance the generalization of AI-generated image detection models across unseen generators.
Contribution
It proposes a Frequency-aware Gated Injection Network with a band-masked frequency encoder and hierarchical gated frequency injection to improve detection generalization.
Findings
FGINet achieves state-of-the-art performance on multiple datasets.
The band-masked frequency encoder reduces reliance on generator-specific cues.
Hierarchical gated injection aligns frequency cues with model hierarchy.
Abstract
AI-generated images are becoming increasingly realistic and diverse, posing significant challenges for generalizable detection. While Vision Foundation Models (VFMs) provide rich semantic representations and frequency-based methods capture complementary artifact cues, existing approaches that combine these modalities still suffer from limited generalization, with notable performance degradation on unseen generative models. We attribute this limitation to two key factors: frequency shortcut bias toward easily distinguishable cues associated with specific generators and cross-domain representation conflict between high-level semantics and low-level frequency patterns. To address these issues, we propose a Frequency-aware Gated Injection Network (FGINet) to improve generalization. Specifically, we design a Band-Masked Frequency Encoder (BMFE) that applies cross-band masking in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
