TL;DR
This paper introduces ATSS, a novel detection method for AI-generated videos that exploits their unique anomalous temporal self-similarity, significantly improving detection accuracy across multiple benchmarks.
Contribution
The paper presents a new multimodal detection framework that leverages the deterministic temporal patterns of AI-generated videos using a triple-similarity and cross-attentive fusion approach.
Findings
ATSS outperforms existing methods on four large-scale benchmarks.
It achieves higher AP, AUC, and ACC metrics.
The method demonstrates strong generalization across diverse video generation models.
Abstract
AI-generated videos (AIGVs) have achieved unprecedented photorealism, posing severe threats to digital forensics. Existing AIGV detectors focus mainly on localized artifacts or short-term temporal inconsistencies, thus often fail to capture the underlying generative logic governing global temporal evolution, limiting AIGV detection performance. In this paper, we identify a distinctive fingerprint in AIGVs, termed anomalous temporal self-similarity (ATSS). Unlike real videos that exhibit stochastic natural dynamics, AIGVs follow deterministic anchor-driven trajectories (e.g., text or image prompts), inducing unnaturally repetitive correlations across visual and semantic domains. To exploit this, we propose the ATSS method, a multimodal detection framework that exploits this insight via a triple-similarity representation and a cross-attentive fusion mechanism. Specifically, ATSS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
