Attention to Detail: Global-Local Attention for High-Resolution AI-Generated Image Detection
Lawrence Han

TL;DR
This paper introduces GLASS, a novel architecture combining global and local high-resolution image features through stratified sampling and attention, significantly improving AI-generated image detection accuracy.
Contribution
GLASS innovatively integrates global and local high-resolution information using stratified sampling and attention, enhancing detection of AI-generated images.
Findings
GLASS outperforms standard transfer learning methods.
It effectively leverages high-resolution details without excessive computational costs.
Experiments validate its superior performance across multiple vision backbones.
Abstract
The rapid development of generative AI has made AI-generated images increasingly realistic and high-resolution. Most AI-generated image detection architectures typically downsample images before inputting them into models, risking the loss of fine-grained details. This paper presents GLASS (Global-Local Attention with Stratified Sampling), an architecture that combines a globally resized view with multiple randomly sampled local crops. These crops are original-resolution regions efficiently selected through spatially stratified sampling and aggregated using attention-based scoring. GLASS can be integrated into vision models to leverage both global and local information in images of any size. Vision Transformer, ResNet, and ConvNeXt models are used as backbones, and experiments show that GLASS outperforms standard transfer learning by achieving higher predictive performance within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
