Swin Transformer for Robust CGI Images Detection: Intra- and Inter-Dataset Analysis across Multiple Color Spaces

Preeti Mehta; Aman Sagar; Suchi Kumari

arXiv:2505.16253·cs.CV·May 23, 2025

Swin Transformer for Robust CGI Images Detection: Intra- and Inter-Dataset Analysis across Multiple Color Spaces

Preeti Mehta, Aman Sagar, Suchi Kumari

PDF

TL;DR

This paper introduces a Swin Transformer-based model for robust CGI image detection across multiple datasets and color spaces, demonstrating superior accuracy and domain generalization compared to CNN-based models.

Contribution

It proposes a novel Swin Transformer architecture tailored for CGI detection, leveraging hierarchical features and multi-color space analysis to improve robustness and generalization.

Findings

01

RGB color space yields highest accuracy

02

Swin Transformer outperforms CNN models like VGG-19 and ResNet-50

03

Model demonstrates strong intra- and inter-dataset robustness

Abstract

This study aims to address the growing challenge of distinguishing computer-generated imagery (CGI) from authentic digital images across three different color spaces; RGB, YCbCr, and HSV. Given the limitations of existing classification methods in handling the complexity and variability of CGI, this research proposes a Swin Transformer based model for accurate differentiation between natural and synthetic images. The proposed model leverages the Swin Transformer's hierarchical architecture to capture local and global features for distinguishing CGI from natural images. Its performance was assessed through intra- and inter-dataset testing across three datasets: CiFAKE, JSSSTU, and Columbia. The model was evaluated individually on each dataset (D1, D2, D3) and on the combined datasets (D1+D2+D3) to test its robustness and domain generalization. To address dataset imbalance, data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Stochastic Depth · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Softmax · Swin Transformer · Position-Wise Feed-Forward Layer · Absolute Position Encodings