SpecSem-Net: Integrating Spectral and Semantic Features for Robust AI-generated Video Detection
Zixi Wei, Huixuaun Zhang, Xiaojun Wan

TL;DR
SpecSem-Net introduces a spectral and semantic feature integration framework with spectral denoising for improved detection of high-fidelity AI-generated videos, addressing limitations of existing methods.
Contribution
It is the first to combine spectral denoising guided by semantic features for robust AI-generated video detection, and provides a new benchmark for evaluation.
Findings
Achieves 87.25% accuracy on the new benchmark.
Achieves 95.59% accuracy on public datasets.
Outperforms existing detection methods.
Abstract
The remarkable visual fidelity of recent commercial video generative models, such as Sora and Veo, renders robust AI-generated video detection increasingly essential to prevent synthetic content from being indistinguishable from real videos and exploited for disinformation. However, existing detectors often fail due to an over-reliance on increasingly realistic semantic features, neglecting subtle spectral artifacts. In this paper, we propose SpecSem-Net, the first framework to introduce a semantic-guided spectral denoising mechanism specifically for high-fidelity AI-generated video detection. Specifically, we design a spectral module to extract high-frequency features via Fourier-Transform based filtering. Furthermore, to reduce misjudgments arising from spectral noise, we employ a Gated Merging Mechanism to adaptively fuse semantic context, effectively mitigating spectral noise.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
