World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks
Zuyao Lin, Jianhui Zhang, Peidong Jia, Xiaoguang Zhao, Shanghang Zhang, Xingyu Chen

TL;DR
This paper introduces World-Ego Modeling, a new paradigm for long-horizon embodied tasks that decomposes world and ego dynamics, improving performance in hybrid navigation-manipulation scenarios.
Contribution
It proposes a novel world-ego decomposition framework, instantiates it as the WEM model, and creates HTEWorld, a benchmark for evaluating long-horizon hybrid embodied tasks.
Findings
WEM achieves state-of-the-art results on HTEWorld.
World-ego decomposition improves long-horizon task performance.
HTEWorld provides extensive data for benchmarking hybrid tasks.
Abstract
World models are widely explored in embodied intelligence, yet they typically predict distinct evolutions of the world and the ego within a single stream, where the world captures persistent instruction-agnostic scene regularities and the ego captures robot-centric instruction-conditioned dynamics. This world-ego entanglement leads to a degradation in long-horizon embodied scenarios, particularly in hybrid tasks with interleaved navigation and manipulation behaviors. In this paper, we introduce \emph{World-Ego Modeling}, a new conceptual paradigm that decomposes future evolution into world and ego components. We define the world-ego boundary from three perspectives, i.e., motion-, semantic-, and intention-based views, and analyze three disentanglement strategies with post-, pre-, and full disentanglement. Further, we instantiate this paradigm as the World-Ego Model (WEM), a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
