Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

Yifei Li; Wenzhao Zheng; Yanran Zhang; Runze Sun; Yu Zheng; Lei Chen; Jie Zhou; Jiwen Lu

arXiv:2512.15693·cs.CV·May 18, 2026

Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

Yifei Li, Wenzhao Zheng, Yanran Zhang, Runze Sun, Yu Zheng, Lei Chen, Jie Zhou, Jiwen Lu

PDF

1 Repo

TL;DR

Skyra is a multimodal large language model designed to detect AI-generated videos by identifying visual artifacts and providing human-understandable explanations, supported by a new large-scale dataset and benchmark.

Contribution

The paper introduces Skyra, a novel model that detects and explains AI-generated videos using grounded artifact reasoning, along with a large dataset and benchmark for evaluation.

Findings

01

Skyra outperforms existing detection methods on multiple benchmarks.

02

The model provides human-interpretable explanations for its detections.

03

The new dataset ViF-CoT-4K enables detailed artifact annotation for training.

Abstract

The misuse of AI-driven video generation technologies has raised serious social concerns, highlighting the urgent need for reliable AI-generated video detectors. However, most existing methods are limited to binary classification and lack the necessary explanations for human interpretation. In this paper, we present Skyra, a specialized multimodal large language model (MLLM) that identifies human-perceivable visual artifacts in AI-generated videos and leverages them as grounded evidence for both detection and explanation. To support this objective, we construct ViF-CoT-4K for Supervised Fine-Tuning (SFT), which represents the first large-scale AI-generated video artifact dataset with fine-grained human annotations. We then develop a two-stage training strategy that systematically enhances our model's spatio-temporal artifact perception, explanation capability, and detection accuracy. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joeleelyf/Skyra
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis