TranX-Adapter: Bridging Artifacts and Semantics within MLLMs for Robust AI-generated Image Detection

Wenbin Wang; Yuge Huang; Jianqing Xu; Yue Yu; Jiangtao Yan; Shouhong Ding; Pan Zhou; Yong Luo

arXiv:2602.21716·cs.CV·February 26, 2026

TranX-Adapter: Bridging Artifacts and Semantics within MLLMs for Robust AI-generated Image Detection

Wenbin Wang, Yuge Huang, Jianqing Xu, Yue Yu, Jiangtao Yan, Shouhong Ding, Pan Zhou, Yong Luo

PDF

Open Access

TL;DR

This paper introduces TranX-Adapter, a lightweight fusion module that enhances AI-generated image detection by effectively combining artifact and semantic features within multimodal large language models, leading to significant accuracy improvements.

Contribution

The paper proposes a novel TranX-Adapter with task-aware optimal-transport fusion and cross-attention mechanisms to improve feature integration in AIGI detection.

Findings

01

Achieves up to +6% accuracy improvement on benchmarks.

02

Effectively fuses artifact and semantic features despite high intra-feature similarity.

03

Enhances robustness of MLLMs in detecting AI-generated images.

Abstract

Rapid advances in AI-generated image (AIGI) technology enable highly realistic synthesis, threatening public information integrity and security. Recent studies have demonstrated that incorporating texture-level artifact features alongside semantic features into multimodal large language models (MLLMs) can enhance their AIGI detection capability. However, our preliminary analyses reveal that artifact features exhibit high intra-feature similarity, leading to an almost uniform attention map after the softmax operation. This phenomenon causes attention dilution, thereby hindering effective fusion between semantic and artifact features. To overcome this limitation, we propose a lightweight fusion adapter, TranX-Adapter, which integrates a Task-aware Optimal-Transport Fusion that leverages the Jensen-Shannon divergence between artifact and semantic prediction probabilities as a cost matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications