EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards

Ruixiang Wang; Qingming Liu; Yueci Deng; Guiliang Liu; Zhen Liu; Kui Jia

arXiv:2603.17808·cs.RO·March 25, 2026

EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards

Ruixiang Wang, Qingming Liu, Yueci Deng, Guiliang Liu, Zhen Liu, Kui Jia

PDF

Open Access 1 Models

TL;DR

This paper introduces EVA, a reinforcement learning framework that aligns video world models with executable robot actions by using inverse dynamics as a reward, improving the physical plausibility and task success of generated videos.

Contribution

EVA leverages inverse dynamics models as a reward signal to train video world models, reducing artifacts and enhancing real robot task performance.

Findings

01

EVA reduces embodiment artifacts in generated videos.

02

EVA improves task success rates on RoboTwin and real robots.

03

EVA aligns visual models with physical constraints effectively.

Abstract

Video generative models are increasingly used as world models for robotics, where a model generates a future visual rollout conditioned on the current observation and task instruction, and an inverse dynamics model (IDM) converts the generated frames into executable robot actions. However, current video world models lack explicit executability constraints. As a result, visually coherent rollouts may still violate rigid-body and kinematic consistency, producing unstable or infeasible control commands when decoded by an IDM. We refer to this mismatch between visual generation and physically executable control as the executability gap. While this gap can be mitigated at inference time using techniques such as rejection sampling, such approaches are inefficient due to the high cost of video generation. In this paper, we leverage the executability gap as a training signal and introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
RobbinWang123/EVA
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Human Motion and Animation