Detecting AI-Generated Images via Contextual Anomaly Estimation in Masked AutoEncoders
Minsuk Jang, Hyunseo Jeong, Minseok Son, Changick Kim

TL;DR
This paper introduces CINEMAE, a novel detection method that combines patch-level anomaly signals and global semantic features from Masked AutoEncoders to improve the robustness of AI-generated image detection.
Contribution
CINEMAE leverages both reconstruction-based anomaly detection and semantic feature extraction from MAE for more reliable AI-generated image detection.
Findings
Achieves 96.63% accuracy on GenImage
Maintains over 93% accuracy under JPEG compression
Outperforms existing detectors in robustness
Abstract
Context-based detection methods such as DetectGPT achieve strong generalization in identifying AI-generated text by evaluating content compatibility with a model's learned distribution. In contrast, existing image detectors rely on discriminative features from pretrained backbones such as CLIP, which implicitly capture generator-specific artifacts. However, as modern generative models rapidly advance in visual fidelity, the artifacts these detectors depend on are becoming increasingly subtle or absent, undermining their reliability. Masked AutoEncoders (MAE) are inherently trained to reconstruct masked patches from visible context, naturally modeling patch-level contextual plausibility akin to conditional probability estimation, while also serving as a powerful semantic feature extractor through its encoder. We propose CINEMAE, a novel architecture that exploits both capabilities of MAE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications
