BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM

Haiquan Wen; Tianxiao Li; Zhenglin Huang; Yiwei He; Guangliang Cheng

arXiv:2507.14632·cs.CV·January 7, 2026

BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM

Haiquan Wen, Tianxiao Li, Zhenglin Huang, Yiwei He, Guangliang Cheng

PDF

Open Access

TL;DR

BusterX++ is a unified framework leveraging multimodal large language models and reinforcement learning to detect and explain synthetic images and videos, addressing limitations of single-modality detection methods.

Contribution

The paper introduces BusterX++, a novel unified detection framework for multimodal synthetic media, and GenBuster++, a comprehensive benchmark for evaluation.

Findings

01

Effective detection of synthetic images and videos.

02

High accuracy and generalizability demonstrated.

03

Benchmark with 4,000 curated media samples.

Abstract

Recent advances in generative AI have dramatically improved image and video synthesis capabilities, significantly increasing the risk of misinformation through sophisticated fake content. In response, detection methods have evolved from traditional approaches to multimodal large language models (MLLMs), offering enhanced transparency and interpretability in identifying synthetic media. However, current detection systems remain fundamentally limited by their single-modality design. These approaches analyze images or videos separately, making them ineffective against synthetic content that combines multiple media formats. To address these challenges, we introduce \textbf{BusterX++}, a framework for unified detection and explanation of synthetic image and video, with a direct reinforcement learning (RL) post-training strategy. To enable comprehensive evaluation, we also present…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)