Detecting AI-Generated Videos with Spiking Neural Networks
Minsuk Jang, Yujin Yang, Heeseon Kim, Minseok Son, Younghun Kim, Changick Kim

TL;DR
This paper introduces MAST, a spiking neural network-based detector that effectively identifies AI-generated videos by exploiting temporal residuals and semantic features, achieving high cross-generator accuracy.
Contribution
The study demonstrates that SNNs naturally respond to localized temporal artifacts, enabling robust detection of AI-generated videos across different generators.
Findings
MAST achieves 93.14% accuracy on GenVideo benchmark.
SNNs respond to temporal artifacts at object and motion boundaries.
MAST outperforms or matches state-of-the-art ANN detectors in cross-generator tests.
Abstract
Modern AI-generated videos are photorealistic at the single-frame level, leaving inter-frame dynamics as the main remaining axis for detection. Existing detectors typically handle this temporal evidence in three ways: feeding the full frame sequence to a generic temporal backbone, reducing one dominant temporal cue to fixed video-level descriptors, or comparing temporal features to real-video statistics through a detection metric. These strategies degrade sharply under cross-generator evaluation, where artifact type and timescale vary across generators. On caption-paired benchmark, GenVidBench, we identify two signatures that prior detectors do not jointly exploit: AI-generated videos exhibit smoother frame-to-frame temporal residuals at the pixel level, and more compact trajectories in the semantic feature space, indicating a temporal smoothness gap at both levels. We further observe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
