VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning

Hao Tan; Jun Lan; Senyuan Shi; Zichang Tan; Zijian Yu; Huijia Zhu; Weiqiang Wang; Jun Wan; Zhen Lei

arXiv:2602.08828·cs.CV·February 10, 2026

VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning

Hao Tan, Jun Lan, Senyuan Shi, Zichang Tan, Zijian Yu, Huijia Zhu, Weiqiang Wang, Jun Wan, Zhen Lei

PDF

Open Access

TL;DR

VideoVeritas introduces a novel framework combining perception and reasoning to improve detection of AI-generated videos, utilizing reinforcement learning and a new dataset to enhance robustness and balance in performance.

Contribution

The paper proposes a new perception pretext reinforcement learning approach and a high-quality dataset, advancing AI-generated video detection beyond existing methods.

Findings

01

VideoVeritas outperforms existing detection methods across multiple benchmarks.

02

The framework achieves balanced reasoning and perception capabilities.

03

The MintVid dataset provides a valuable resource for future research.

Abstract

The growing capability of video generation poses escalating security risks, making reliable detection increasingly essential. In this paper, we introduce VideoVeritas, a framework that integrates fine-grained perception and fact-based reasoning. We observe that while current multi-modal large language models (MLLMs) exhibit strong reasoning capacity, their granular perception ability remains limited. To mitigate this, we introduce Joint Preference Alignment and Perception Pretext Reinforcement Learning (PPRL). Specifically, rather than directly optimizing for detection task, we adopt general spatiotemporal grounding and self-supervised object counting in the RL stage, enhancing detection performance with simple perception pretext tasks. To facilitate robust evaluation, we further introduce MintVid, a light yet high-quality dataset containing 3K videos from 9 state-of-the-art generators,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning