MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image Detection

Kuo Shi; Jie Lu; Shanshan Ye; Guangquan Zhang; and Zhen Fang

arXiv:2508.01525·cs.CV·August 5, 2025

MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image Detection

Kuo Shi, Jie Lu, Shanshan Ye, Guangquan Zhang, and Zhen Fang

PDF

Open Access

TL;DR

MiraGe introduces a multimodal discriminative learning approach that enhances the detection of AI-generated images, especially from unseen generators, by learning generator-invariant features aligned with semantic information.

Contribution

The paper presents MiraGe, a novel method combining multimodal prompt learning with feature alignment to improve generalization in AI-generated image detection.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Maintains robustness against unseen generators like Sora.

03

Effectively leverages CLIP and text embeddings for discriminative learning.

Abstract

Recent advances in generative models have highlighted the need for robust detectors capable of distinguishing real images from AI-generated images. While existing methods perform well on known generators, their performance often declines when tested with newly emerging or unseen generative models due to overlapping feature embeddings that hinder accurate cross-generator classification. In this paper, we propose Multimodal Discriminative Representation Learning for Generalizable AI-generated Image Detection (MiraGe), a method designed to learn generator-invariant features. Motivated by theoretical insights on intra-class variation minimization and inter-class separation, MiraGe tightly aligns features within the same class while maximizing separation between classes, enhancing feature discriminability. Moreover, we apply multimodal prompt learning to further refine these principles into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis