ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models

Baorui Peng; Wenyao Zhang; Liang Xu; Zekun Qi; Jiazhao Zhang; Hongsi Liu; Wenjun Zeng; Xin Jin

arXiv:2601.12428·cs.RO·January 21, 2026

ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models

Baorui Peng, Wenyao Zhang, Liang Xu, Zekun Qi, Jiazhao Zhang, Hongsi Liu, Wenjun Zeng, Xin Jin

PDF

Open Access

TL;DR

ReWorld introduces a reinforcement learning framework that enhances video-based embodied world models by aligning them with physical realism, task performance, and visual quality through a large-scale human preference dataset and a hierarchical reward model.

Contribution

The paper presents ReWorld, a novel framework that employs multi-dimensional reward modeling and reinforcement learning to improve physical fidelity, logical coherence, and visual quality in embodied world models.

Findings

01

ReWorld significantly improves physical realism and task performance.

02

The hierarchical reward model effectively captures human preferences.

03

ReWorld outperforms previous methods in various evaluations.

Abstract

Recently, video-based world models that learn to simulate the dynamics have gained increasing attention in robot learning. However, current approaches primarily emphasize visual generative quality while overlooking physical fidelity, dynamic consistency, and task logic, especially for contact-rich manipulation tasks, which limits their applicability to downstream tasks. To this end, we introduce ReWorld, a framework aimed to employ reinforcement learning to align the video-based embodied world models with physical realism, task completion capability, embodiment plausibility and visual quality. Specifically, we first construct a large-scale (~235K) video preference dataset and employ it to train a hierarchical reward model designed to capture multi-dimensional reward consistent with human preferences. We further propose a practical alignment algorithm that post-trains flow-based world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications