Grab-3D: Detecting AI-Generated Videos from 3D Geometric Temporal Consistency
Wenhan Chen, Sezer Karaoglu, Theo Gevers

TL;DR
Grab-3D introduces a geometry-aware transformer that leverages 3D geometric temporal consistency to effectively detect AI-generated videos, outperforming existing methods and generalizing well across different generators.
Contribution
The paper presents Grab-3D, a novel transformer framework utilizing 3D geometric features, specifically vanishing points, for improved detection of AI-generated videos.
Findings
Grab-3D outperforms state-of-the-art detectors.
Achieves robust cross-domain generalization.
Utilizes vanishing points for 3D geometric analysis.
Abstract
Recent advances in diffusion-based generation techniques enable AI models to produce highly realistic videos, heightening the need for reliable detection mechanisms. However, existing detection methods provide only limited exploration of the 3D geometric patterns present in generated videos. In this paper, we use vanishing points as an explicit representation of 3D geometry patterns, revealing fundamental discrepancies in geometric consistency between real and AI-generated videos. We introduce Grab-3D, a geometry-aware transformer framework for detecting AI-generated videos based on 3D geometric temporal consistency. To enable reliable evaluation, we construct an AI-generated video dataset of static scenes, allowing stable 3D geometric feature extraction. We propose a geometry-aware transformer equipped with geometric positional encoding, temporal-geometric attention, and an EMA-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
