ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment
Yuzhi Chen, Ronghan Chen, Dongjie Huo, Yandan Yang, Dekang Qi, Haoyun Liu, Tong Lin, Shuang Zeng, Junjin Xiao, Xinyuan Chang, Feng Xiong, Xing Wei, Zhiheng Ma, and Mu Xu

TL;DR
ABot-PhysWorld is a large diffusion transformer model that generates realistic, physically plausible robotic manipulation videos, trained on a curated dataset with physics-aware annotations and evaluated on new benchmarks.
Contribution
The paper introduces a novel physics-aware training framework for a large diffusion model and a new benchmark for evaluating physical realism and action alignment in robotic videos.
Findings
Achieves state-of-the-art results on PBench and EZSbench benchmarks.
Surpasses previous models in physical plausibility and trajectory consistency.
Introduces a new zero-shot benchmark for embodied video generation evaluation.
Abstract
Video-based world models offer a powerful paradigm for embodied simulation and planning, yet state-of-the-art models often generate physically implausible manipulations - such as object penetration and anti-gravity motion - due to training on generic visual data and likelihood-based objectives that ignore physical laws. We present ABot-PhysWorld, a 14B Diffusion Transformer model that generates visually realistic, physically plausible, and action-controllable videos. Built on a curated dataset of three million manipulation clips with physics-aware annotation, it uses a novel DPO-based post-training framework with decoupled discriminators to suppress unphysical behaviors while preserving visual quality. A parallel context block enables precise spatial action injection for cross-embodiment control. To better evaluate generalization, we introduce EZSbench, the first training-independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
