TranX-Adapter: Bridging Artifacts and Semantics within MLLMs for Robust AI-generated Image Detection
Wenbin Wang, Yuge Huang, Jianqing Xu, Yue Yu, Jiangtao Yan, Shouhong Ding, Pan Zhou, Yong Luo

TL;DR
This paper introduces TranX-Adapter, a lightweight fusion module that enhances AI-generated image detection by effectively combining artifact and semantic features within multimodal large language models, leading to significant accuracy improvements.
Contribution
The paper proposes a novel TranX-Adapter with task-aware optimal-transport fusion and cross-attention mechanisms to improve feature integration in AIGI detection.
Findings
Achieves up to +6% accuracy improvement on benchmarks.
Effectively fuses artifact and semantic features despite high intra-feature similarity.
Enhances robustness of MLLMs in detecting AI-generated images.
Abstract
Rapid advances in AI-generated image (AIGI) technology enable highly realistic synthesis, threatening public information integrity and security. Recent studies have demonstrated that incorporating texture-level artifact features alongside semantic features into multimodal large language models (MLLMs) can enhance their AIGI detection capability. However, our preliminary analyses reveal that artifact features exhibit high intra-feature similarity, leading to an almost uniform attention map after the softmax operation. This phenomenon causes attention dilution, thereby hindering effective fusion between semantic and artifact features. To overcome this limitation, we propose a lightweight fusion adapter, TranX-Adapter, which integrates a Task-aware Optimal-Transport Fusion that leverages the Jensen-Shannon divergence between artifact and semantic prediction probabilities as a cost matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
