SpecSem-Net: Integrating Spectral and Semantic Features for Robust AI-generated Video Detection

Zixi Wei; Huixuaun Zhang; Xiaojun Wan

arXiv:2605.17311·cs.CV·May 19, 2026

SpecSem-Net: Integrating Spectral and Semantic Features for Robust AI-generated Video Detection

Zixi Wei, Huixuaun Zhang, Xiaojun Wan

PDF

TL;DR

SpecSem-Net introduces a spectral and semantic feature integration framework with spectral denoising for improved detection of high-fidelity AI-generated videos, addressing limitations of existing methods.

Contribution

It is the first to combine spectral denoising guided by semantic features for robust AI-generated video detection, and provides a new benchmark for evaluation.

Findings

01

Achieves 87.25% accuracy on the new benchmark.

02

Achieves 95.59% accuracy on public datasets.

03

Outperforms existing detection methods.

Abstract

The remarkable visual fidelity of recent commercial video generative models, such as Sora and Veo, renders robust AI-generated video detection increasingly essential to prevent synthetic content from being indistinguishable from real videos and exploited for disinformation. However, existing detectors often fail due to an over-reliance on increasingly realistic semantic features, neglecting subtle spectral artifacts. In this paper, we propose SpecSem-Net, the first framework to introduce a semantic-guided spectral denoising mechanism specifically for high-fidelity AI-generated video detection. Specifically, we design a spectral module to extract high-frequency features via Fourier-Transform based filtering. Furthermore, to reduce misjudgments arising from spectral noise, we employ a Gated Merging Mechanism to adaptively fuse semantic context, effectively mitigating spectral noise.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.