Scalable Policy Evaluation with Video World Models

Wei-Cheng Tseng; Jinwei Gu; Qinsheng Zhang; Hanzi Mao; Ming-Yu Liu; Florian Shkurti; Lin Yen-Chen

arXiv:2511.11520·cs.RO·December 5, 2025

Scalable Policy Evaluation with Video World Models

Wei-Cheng Tseng, Jinwei Gu, Qinsheng Zhang, Hanzi Mao, Ming-Yu Liu, Florian Shkurti, Lin Yen-Chen

PDF

Open Access

TL;DR

This paper proposes using action-conditional video world models trained on internet videos to evaluate robotic policies efficiently, reducing reliance on costly real-world testing and simulation environments.

Contribution

It introduces a scalable, video-based world model approach for policy evaluation that leverages pre-trained models and internet videos, addressing data collection and sim-to-real gap issues.

Findings

01

Models correlate well with actual policy values

02

Effective across multiple evaluation metrics

03

Reduces need for real-world robot testing

Abstract

Training generalist policies for robotic manipulation has shown great promise, as they enable language-conditioned, multi-task behaviors across diverse scenarios. However, evaluating these policies remains difficult because real-world testing is expensive, time-consuming, and labor-intensive. It also requires frequent environment resets and carries safety risks when deploying unproven policies on physical robots. Manually creating and populating simulation environments with assets for robotic manipulation has not addressed these issues, primarily due to the significant engineering effort required and the substantial sim-to-real gap, both in terms of physics and rendering. In this paper, we explore the use of action-conditional video generation models as a scalable way to learn world models for policy evaluation. We demonstrate how to incorporate action conditioning into existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications