Video Models Can Reason with Verifiable Rewards

Tinghui Zhu; Sheng Zhang; James Y. Huang; Selena Song; Xiaofei Wen; Yuankai Li; Hoifung Poon; Muhao Chen

arXiv:2605.15458·cs.CV·May 18, 2026

Video Models Can Reason with Verifiable Rewards

Tinghui Zhu, Sheng Zhang, James Y. Huang, Selena Song, Xiaofei Wen, Yuankai Li, Hoifung Poon, Muhao Chen

PDF

1 Repo 2 Models 1 Datasets

TL;DR

This paper introduces VideoRLVR, a reinforcement learning approach for optimizing video diffusion models to perform verifiable reasoning tasks with explicit constraints, improving reliability and rule consistency.

Contribution

It presents a novel recipe combining rule-based feedback with diffusion models, including an efficient Early-Step Focus strategy, to enhance verifiable reasoning in videos.

Findings

01

VideoRLVR outperforms supervised baselines on Maze, FlowFree, and Sokoban.

02

Dense decomposed rewards improve performance in low-success-rate scenarios.

03

The approach surpasses proprietary and open-source models on reasoning benchmarks.

Abstract

Video diffusion models have made rapid progress in perceptual realism and temporal coherence, but they remain primarily optimized for plausible generation rather than verifiable reasoning. This limitation is especially pronounced in tasks where generated videos must satisfy explicit spatial, temporal, or logical constraints. Inspired by the role of reinforcement learning with verifiable rewards (RLVR) in reasoning-oriented language models, we introduce VideoRLVR, a practical recipe for optimizing video diffusion models with rule-based feedback. VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories and consists of an SDE-GRPO optimization backbone, dense decomposed rewards, and an Early-Step Focus strategy for efficient training. The Early-Step Focus strategy restricts policy optimization to the early denoising phase, reducing training latency by about…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luka-group/VideoRLVR
github

Models

Datasets

DarthZhu/VideoRLVR-Data
dataset· 24k dl
24k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.