Swin Transformer for Robust CGI Images Detection: Intra- and Inter-Dataset Analysis across Multiple Color Spaces
Preeti Mehta, Aman Sagar, Suchi Kumari

TL;DR
This paper introduces a Swin Transformer-based model for robust CGI image detection across multiple datasets and color spaces, demonstrating superior accuracy and domain generalization compared to CNN-based models.
Contribution
It proposes a novel Swin Transformer architecture tailored for CGI detection, leveraging hierarchical features and multi-color space analysis to improve robustness and generalization.
Findings
RGB color space yields highest accuracy
Swin Transformer outperforms CNN models like VGG-19 and ResNet-50
Model demonstrates strong intra- and inter-dataset robustness
Abstract
This study aims to address the growing challenge of distinguishing computer-generated imagery (CGI) from authentic digital images across three different color spaces; RGB, YCbCr, and HSV. Given the limitations of existing classification methods in handling the complexity and variability of CGI, this research proposes a Swin Transformer based model for accurate differentiation between natural and synthetic images. The proposed model leverages the Swin Transformer's hierarchical architecture to capture local and global features for distinguishing CGI from natural images. Its performance was assessed through intra- and inter-dataset testing across three datasets: CiFAKE, JSSSTU, and Columbia. The model was evaluated individually on each dataset (D1, D2, D3) and on the combined datasets (D1+D2+D3) to test its robustness and domain generalization. To address dataset imbalance, data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Stochastic Depth · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Softmax · Swin Transformer · Position-Wise Feed-Forward Layer · Absolute Position Encodings
