RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO
Yanzuo Lu, Ronglai Zuo, Jiankang Deng

TL;DR
RAVEN introduces a novel training framework and reinforcement learning method for real-time autoregressive video extrapolation, significantly improving long-horizon generation quality.
Contribution
The paper proposes RAVEN, a training-time test framework for better history representation, and CM-GRPO, a reinforcement learning approach that enhances video extrapolation performance.
Findings
RAVEN outperforms recent causal video distillation baselines in quality and semantic evaluations.
CM-GRPO provides additional gains when combined with RAVEN.
Experiments show improved long-horizon video generation quality.
Abstract
Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
