NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection
Jiazhen Yan, Fan Wang, Weiwei Jiang, Ziqiang Li, Zhangjie Fu

TL;DR
This paper introduces NS-Net, a novel detection framework that uses NULL-Space projection and contrastive learning to improve generalization in AI-generated image detection across diverse models.
Contribution
NS-Net decouples semantic information from CLIP features and employs a patch selection strategy, enhancing detection accuracy and generalization to unseen generative models.
Findings
NS-Net achieves 7.4% higher detection accuracy than existing methods.
It generalizes well across 40 diverse generative models.
The approach effectively captures intrinsic differences between real and fake images.
Abstract
The rapid progress of generative models, such as GANs and diffusion models, has facilitated the creation of highly realistic images, raising growing concerns over their misuse in security-sensitive domains. While existing detectors perform well under known generative settings, they often fail to generalize to unknown generative models, especially when semantic content between real and fake images is closely aligned. In this paper, we revisit the use of CLIP features for AI-generated image detection and uncover a critical limitation: the high-level semantic information embedded in CLIP's visual features hinders effective discrimination. To address this, we propose NS-Net, a novel detection framework that leverages NULL-Space projection to decouple semantic information from CLIP's visual features, followed by contrastive learning to capture intrinsic distributional differences between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
