TL;DR
This paper introduces a new dataset and detection framework for AI-generated videos that preserves high-frequency forgery artifacts by operating at native scale, improving detection accuracy over existing methods.
Contribution
It presents a large-scale dataset and a novel Vision Transformer-based detection framework that operates at native resolution, addressing limitations of prior preprocessing-dependent methods.
Findings
Our method outperforms existing detection techniques across multiple benchmarks.
Native-scale processing preserves subtle forgery artifacts better than traditional preprocessing.
The new dataset enables more realistic evaluation of AI-generated video detection methods.
Abstract
The rapid advancement of video generation models has enabled the creation of highly realistic synthetic media, raising significant societal concerns regarding the spread of misinformation. However, current detection methods suffer from critical limitations. They rely on preprocessing operations like fixed-resolution resizing and cropping. These operations not only discard subtle, high-frequency forgery traces but also cause spatial distortion and significant information loss. Furthermore, existing methods are often trained and evaluated on outdated datasets that fail to capture the sophistication of modern generative models. To address these challenges, we introduce a comprehensive dataset and a novel detection framework. First, we curate a large-scale dataset of over 140K videos from 15 state-of-the-art open-source and commercial generators, along with Magic Videos benchmark designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
