dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model
Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, and Yichen Zhu

TL;DR
dWorldEval introduces a scalable, transformer-based discrete diffusion world model that unifies multiple modalities for efficient robotics policy evaluation across diverse tasks and environments.
Contribution
The paper presents dWorldEval, a novel approach using a discrete diffusion world model with a transformer network and memory mechanisms for scalable robotics policy evaluation.
Findings
dWorldEval outperforms previous methods like WorldEval, Ctrl-World, and WorldGym.
It effectively models vision, language, and actions in a unified token space.
The approach enables evaluation across thousands of environments and tasks.
Abstract
Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy evaluation. In this paper, we propose dWorldEval, which uses a discrete diffusion world model as a scalable evaluation proxy for robotics policies. Specifically, dWorldEval maps all modalities - including vision, language, and robotic actions - into a unified token space, modeling them via a single transformer-based denoising network. In this paper, we propose dWorldEval, using a discrete diffusion world model as a scalable evaluation proxy for robotics policy. Specifically, it maps all modalities, including vision, language, and robotics action into a unified token space, then denoises them with a single transformer network. Building on this architecture, we employ a sparse keyframe memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
