Rethinking the Use of Vision Transformers for AI-Generated Image Detection

NaHyeon Park; Kunhee Kim; Junsuk Choe; Hyunjung Shim

arXiv:2512.04969·cs.CV·December 5, 2025

Rethinking the Use of Vision Transformers for AI-Generated Image Detection

NaHyeon Park, Kunhee Kim, Junsuk Choe, Hyunjung Shim

PDF

Open Access

TL;DR

This paper analyzes layer-wise features from CLIP-ViT for AI-generated image detection, revealing earlier layers' effectiveness and introducing MoLD, a dynamic multi-layer feature integration method that improves detection accuracy and robustness.

Contribution

It systematically studies layer contributions in ViT for detection and proposes MoLD, a novel adaptive feature integration method that enhances detection performance and generalization.

Findings

01

Earlier layers provide more localized features for detection.

02

MoLD outperforms existing methods on GAN and diffusion images.

03

The approach generalizes to other ViT models like DINOv2.

Abstract

Rich feature representations derived from CLIP-ViT have been widely utilized in AI-generated image detection. While most existing methods primarily leverage features from the final layer, we systematically analyze the contributions of layer-wise features to this task. Our study reveals that earlier layers provide more localized and generalizable features, often surpassing the performance of final-layer features in detection tasks. Moreover, we find that different layers capture distinct aspects of the data, each contributing uniquely to AI-generated image detection. Motivated by these findings, we introduce a novel adaptive method, termed MoLD, which dynamically integrates features from multiple ViT layers using a gating-based mechanism. Extensive experiments on both GAN- and diffusion-generated images demonstrate that MoLD significantly improves detection performance, enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning