EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models

Hu Yue; Siyuan Huang; Yue Liao; Shengcong Chen; Pengfei Zhou; Liliang Chen; Maoqing Yao; Guanghui Ren

arXiv:2505.09694·cs.RO·May 20, 2025

EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models

Hu Yue, Siyuan Huang, Yue Liao, Shengcong Chen, Pengfei Zhou, Liliang Chen, Maoqing Yao, Guanghui Ren

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

EWMBench is a new benchmark framework designed to evaluate embodied world models by assessing scene, motion, and semantic quality, addressing the need for physically grounded and action-consistent AI generated scenes.

Contribution

The paper introduces EWMBench, a comprehensive evaluation framework with a curated dataset and tools to assess embodied world models beyond perceptual metrics.

Findings

01

Existing models often lack physical grounding.

02

EWMBench effectively identifies model limitations.

03

Benchmark guides future embodied AI development.

Abstract

Recent advances in creative AI have enabled the synthesis of high-fidelity images and videos conditioned on language instructions. Building on these developments, text-to-video diffusion models have evolved into embodied world models (EWMs) capable of generating physically plausible scenes from language commands, effectively bridging vision and action in embodied AI applications. This work addresses the critical challenge of evaluating EWMs beyond general perceptual metrics to ensure the generation of physically grounded and action-consistent behaviors. We propose the Embodied World Model Benchmark (EWMBench), a dedicated framework designed to evaluate EWMs based on three key aspects: visual scene consistency, motion correctness, and semantic alignment. Our approach leverages a meticulously curated dataset encompassing diverse scenes and motion patterns, alongside a comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agibottech/ewmbench
pytorchOfficial

Models

🤗
agibot-world/EWMBench-model
model· ♡ 2
♡ 2

Datasets

agibot-world/EWMBench
dataset· 241 dl
241 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis