WorldEval: World Model as Real-World Robot Policies Evaluator

Yaxuan Li; Yichen Zhu; Junjie Wen; Chaomin Shen; Yi Xu

arXiv:2505.19017·cs.RO·May 27, 2025

WorldEval: World Model as Real-World Robot Policies Evaluator

Yaxuan Li, Yichen Zhu, Junjie Wen, Chaomin Shen, Yi Xu

PDF

Open Access

TL;DR

This paper introduces WorldEval, a scalable and reliable framework using world models to evaluate robot manipulation policies efficiently and safely, correlating well with real-world performance and outperforming existing methods.

Contribution

It presents Policy2Vec for realistic video generation from world models and develops WorldEval, an automated online evaluation pipeline for robot policies.

Findings

01

Strong correlation between WorldEval and real-world performance

02

WorldEval outperforms real-to-sim evaluation methods

03

Enables safe, scalable policy assessment in diverse environments

Abstract

The field of robotics has made significant strides toward developing generalist robot manipulation policies. However, evaluating these policies in real-world scenarios remains time-consuming and challenging, particularly as the number of tasks scales and environmental conditions change. In this work, we demonstrate that world models can serve as a scalable, reproducible, and reliable proxy for real-world robot policy evaluation. A key challenge is generating accurate policy videos from world models that faithfully reflect the robot actions. We observe that directly inputting robot actions or using high-dimensional encoding methods often fails to generate action-following videos. To address this, we propose Policy2Vec, a simple yet effective approach to turn a video generation model into a world simulator that follows latent action to generate the robot video. We then introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed systems and fault tolerance