Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou, Peipei Li, Zekun Li, Huaibo Huang, Xing Cui, Xuannan Liu,, Chenghanyu Zhang, Ran He

TL;DR
This survey reviews the evolution of AI-generated media detection methods, comparing non-MLLM and MLLM approaches, analyzing their methodologies, challenges, and ethical considerations to guide future research.
Contribution
It provides the first comprehensive comparison of non-MLLM and MLLM-based detection methods, analyzing their differences, potential hybrids, and addressing ethical and regulatory issues.
Findings
MLLM-based detectors offer broader applicability and explainability.
Hybrid approaches show promise in improving detection accuracy.
Regulatory landscapes vary significantly across jurisdictions.
Abstract
The proliferation of AI-generated media poses significant challenges to information authenticity and social trust, making reliable detection methods highly demanded. Methods for detecting AI-generated media have evolved rapidly, paralleling the advancement of Multimodal Large Language Models (MLLMs). Current detection approaches can be categorized into two main groups: Non-MLLM-based and MLLM-based methods. The former employs high-precision, domain-specific detectors powered by deep learning techniques, while the latter utilizes general-purpose detectors based on MLLMs that integrate authenticity verification, explainability, and localization capabilities. Despite significant progress in this field, there remains a gap in literature regarding a comprehensive survey that examines the transition from domain-specific to general-purpose detection methods. This paper addresses this gap by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis
