LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection
Qingyuan Liu, Yun-Yun Tsai, Ruijian Zha, Victoria Li and, Pengyuan Shi, Chengzhi Mao, Junfeng Yang

TL;DR
LAVID introduces a training-free, LVLM-based framework for detecting AI-generated videos by leveraging external tools and adaptive prompt structuring, significantly improving detection accuracy on a new benchmark.
Contribution
The paper presents LAVID, a novel LVLM-based, training-free video detection method that uses explicit knowledge and adaptive prompts, addressing limitations of traditional deep learning approaches.
Findings
LAVID improves F1 scores by 6.2 to 30.2% over top baselines.
The method is fully inference-based, avoiding additional training.
Evaluation on the new VidFor dataset demonstrates effectiveness.
Abstract
The impressive achievements of generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities. Recent works of AI-generated content detection have been widely studied in the image field (e.g., deepfake), yet the video field has been unexplored. Large Vision Language Model (LVLM) has become an emerging tool for AI-generated content detection for its strong reasoning and multimodal capabilities. It breaks the limitations of traditional deep learning based methods faced with like lack of transparency and inability to recognize new artifacts. Motivated by this, we propose LAVID, a novel LVLMs-based ai-generated video detection with explicit knowledge enhancement. Our insight list as follows: (1) The leading LVLMs can call external tools to extract useful information to facilitate its own video detection task; (2) Structuring the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
