TL;DR
Ivy-Fake introduces a large-scale, explainable multimodal benchmark and a reinforcement learning detector for improved detection and interpretability of AI-generated images and videos.
Contribution
It provides the first comprehensive explainable dataset and a novel RL-based detection model for multimodal AIGC content.
Findings
The dataset contains over 106K annotated samples and 5,000 verified evaluation examples.
The proposed Ivy-xDetector achieves 96.32% on GenImage, surpassing previous methods.
Experiments validate the dataset's diversity and the model's effectiveness.
Abstract
The rapid development of Artificial Intelligence Generated Content (AIGC) techniques has enabled the creation of high-quality synthetic content, but it also raises significant security concerns. Current detection methods face two major limitations: (1) the lack of multidimensional explainable datasets for generated images and videos. Existing open-source datasets (e.g., WildFake, GenVideo) rely on oversimplified binary annotations, which restrict the explainability and trustworthiness of trained detectors. (2) Prior MLLM-based forgery detectors (e.g., FakeVLM) exhibit insufficiently fine-grained interpretability in their step-by-step reasoning, which hinders reliable localization and explanation. To address these challenges, we introduce Ivy-Fake, the first large-scale multimodal benchmark for explainable AIGC detection. It consists of over 106K richly annotated training samples (images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
