Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning

Ming Liu; Yunbei Zhang; Shilong Liu; Liwen Wang; and Wensheng Zhang

arXiv:2603.27866·cs.CV·March 31, 2026

Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning

Ming Liu, Yunbei Zhang, Shilong Liu, Liwen Wang, and Wensheng Zhang

PDF

TL;DR

This paper introduces verifiable reward functions for reinforcement learning in video reasoning tasks, improving generalization and training stability in maze-solving and robotic navigation.

Contribution

It systematically studies reward design in RL for video reasoning, proposing verifiable rewards that enhance robustness and generalization over multimodal reward models.

Findings

01

Verifiable rewards improve exact match accuracy by 29.1% in maze tasks.

02

Verifiable rewards enhance trap-avoidance performance by 51.4%.

03

Multimodal reward models can cause degenerate solutions, unlike verifiable rewards.

Abstract

Video generation models produce visually coherent content but struggle with tasks requiring spatial reasoning and multi-step planning. Reinforcement learning (RL) offers a path to improve generalization, but its effectiveness in video reasoning hinges on reward design -- a challenge that has received little systematic study. We investigate this problem by adapting Group Relative Policy Optimization (GRPO) to flow-based video models and training them on maze-solving and robotic navigation tasks. We first show that multimodal reward models fail catastrophically in this setting. To address this, we design verifiable reward functions grounded in objective task metrics. For structured game environments, we introduce a multi-component trajectory reward. For robotic navigation, we propose an embedding-level verifiable reward. Our experiments show that RL fine-tuning with verifiable rewards…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.