GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection?
Yueying Zou, Pei Pei Li, Zekun Li, Xinyu Guo, Xing Cui, Huaibo Huang, Ran He

TL;DR
GenVideoLens is a detailed benchmark that evaluates LVLMs' ability to detect AI-generated videos across multiple authenticity dimensions, revealing strengths and weaknesses in current models.
Contribution
The paper introduces GenVideoLens, a fine-grained benchmark with expert annotations across 15 dimensions, enabling detailed evaluation of LVLMs in AI-generated video detection.
Findings
LVLMs perform well on perceptual cues but struggle with optical, physical, and temporal cues.
Model performance varies widely across dimensions, with smaller models sometimes outperforming larger ones.
Current LVLMs have limited use of temporal information in video detection.
Abstract
In recent years, AI-generated videos have become increasingly realistic and sophisticated. Meanwhile, Large Vision-Language Models (LVLMs) have shown strong potential for detecting such content. However, existing evaluation protocols largely treat the task as a binary classification problem and rely on coarse-grained metrics such as overall accuracy, providing limited insight into where LVLMs succeed or fail. To address this limitation, we introduce GenVideoLens, a fine-grained benchmark that enables dimension-wise evaluation of LVLM capabilities in AI-generated video detection. The benchmark contains 400 highly deceptive AI-generated videos and 100 real videos, annotated by experts across 15 authenticity dimensions covering perceptual, optical, physical, and temporal cues. We evaluate eleven representative LVLMs on this benchmark. Our analysis reveals a pronounced dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
