World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks

Zuyao Lin; Jianhui Zhang; Peidong Jia; Xiaoguang Zhao; Shanghang Zhang; Xingyu Chen

arXiv:2605.19957·cs.CV·May 20, 2026

World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks

Zuyao Lin, Jianhui Zhang, Peidong Jia, Xiaoguang Zhao, Shanghang Zhang, Xingyu Chen

PDF

1 Datasets

TL;DR

This paper introduces World-Ego Modeling, a new paradigm for long-horizon embodied tasks that decomposes world and ego dynamics, improving performance in hybrid navigation-manipulation scenarios.

Contribution

It proposes a novel world-ego decomposition framework, instantiates it as the WEM model, and creates HTEWorld, a benchmark for evaluating long-horizon hybrid embodied tasks.

Findings

01

WEM achieves state-of-the-art results on HTEWorld.

02

World-ego decomposition improves long-horizon task performance.

03

HTEWorld provides extensive data for benchmarking hybrid tasks.

Abstract

World models are widely explored in embodied intelligence, yet they typically predict distinct evolutions of the world and the ego within a single stream, where the world captures persistent instruction-agnostic scene regularities and the ego captures robot-centric instruction-conditioned dynamics. This world-ego entanglement leads to a degradation in long-horizon embodied scenarios, particularly in hybrid tasks with interleaved navigation and manipulation behaviors. In this paper, we introduce \emph{World-Ego Modeling}, a new conceptual paradigm that decomposes future evolution into world and ego components. We define the world-ego boundary from three perspectives, i.e., motion-, semantic-, and intention-based views, and analyze three disentanglement strategies with post-, pre-, and full disentanglement. Further, we instantiate this paradigm as the World-Ego Model (WEM), a unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Zoorao/HTEWorld
dataset· 1.1k dl
1.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.