AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment

Ruoxin Chen; Jiahui Gao; Kaiqing Lin; Keyue Zhang; Yandan Zhao; Isabel Guan; Taiping Yao; Shouhong Ding

arXiv:2512.06746·cs.CV·February 2, 2026

AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment

Ruoxin Chen, Jiahui Gao, Kaiqing Lin, Keyue Zhang, Yandan Zhao, Isabel Guan, Taiping Yao, Shouhong Ding

PDF

Open Access 1 Datasets

TL;DR

AlignGemini introduces a dual-branch detector combining semantic and pixel artifact analysis, significantly improving generalization in AI-generated image detection by aligning model specialization with subtask requirements.

Contribution

The paper proposes the Task-Model Alignment principle and implements it in a two-branch detector, enhancing generalization in AI-generated image detection.

Findings

01

Improved average accuracy by 9.5% on benchmarks

02

Semantic supervision enhances generalization to unseen data

03

Pixel-artifact supervision captures low-level cues effectively

Abstract

Vision Language Models (VLMs) are increasingly used for detecting AI-generated images (AIGI). However, converting VLMs into reliable detectors is resource-intensive, and the resulting models often suffer from hallucination and poor generalization. To investigate the root cause, we conduct an empirical analysis and identify two consistent behaviors. First, fine-tuning VLMs with semantic supervision improves semantic discrimination and generalizes well to unseen data. Second, fine-tuning VLMs with pixel-artifact supervision leads to weak generalization. These findings reveal a fundamental task-model misalignment. VLMs are optimized for high-level semantic reasoning and lack inductive bias toward low-level pixel artifacts. In contrast, conventional vision models effectively capture pixel-level artifacts but are less sensitive to semantic inconsistencies. This indicates that different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Gaffeyzz/AIGI-Now
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning