dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

Yaxuan Li; Zhongyi Zhou; Yefei Chen; Yaokai Xue; and Yichen Zhu

arXiv:2604.22152·cs.RO·April 27, 2026

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, and Yichen Zhu

PDF

TL;DR

dWorldEval introduces a scalable, transformer-based discrete diffusion world model that unifies multiple modalities for efficient robotics policy evaluation across diverse tasks and environments.

Contribution

The paper presents dWorldEval, a novel approach using a discrete diffusion world model with a transformer network and memory mechanisms for scalable robotics policy evaluation.

Findings

01

dWorldEval outperforms previous methods like WorldEval, Ctrl-World, and WorldGym.

02

It effectively models vision, language, and actions in a unified token space.

03

The approach enables evaluation across thousands of environments and tasks.

Abstract

Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy evaluation. In this paper, we propose dWorldEval, which uses a discrete diffusion world model as a scalable evaluation proxy for robotics policies. Specifically, dWorldEval maps all modalities - including vision, language, and robotic actions - into a unified token space, modeling them via a single transformer-based denoising network. In this paper, we propose dWorldEval, using a discrete diffusion world model as a scalable evaluation proxy for robotics policy. Specifically, it maps all modalities, including vision, language, and robotics action into a unified token space, then denoises them with a single transformer network. Building on this architecture, we employ a sparse keyframe memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.