EgoSim: Egocentric World Simulator for Embodied Interaction Generation

Jinkun Hao; Mingda Jia; Ruiyan Wang; Xihui Liu; Ran Yi; Lizhuang Ma; Jiangmiao Pang; Xudong Xu

arXiv:2604.01001·cs.CV·April 2, 2026

EgoSim: Egocentric World Simulator for Embodied Interaction Generation

Jinkun Hao, Mingda Jia, Ruiyan Wang, Xihui Liu, Ran Yi, Lizhuang Ma, Jiangmiao Pang, Xudong Xu

PDF

2 Repos

TL;DR

EgoSim is a novel egocentric world simulator that generates consistent interaction videos, updates 3D scene states, and overcomes data collection challenges with a scalable pipeline and low-cost capture system.

Contribution

EgoSim introduces a closed-loop simulation framework with explicit 3D grounding, a scalable data pipeline from monocular videos, and a low-cost data collection system, advancing egocentric interaction modeling.

Findings

01

EgoSim outperforms existing methods in visual quality and spatial consistency.

02

It generalizes well to complex scenes and in-the-wild interactions.

03

Supports cross-embodiment transfer to robotic manipulation.

Abstract

We introduce EgoSim, a closed-loop egocentric world simulator that generates spatially consistent interaction videos and persistently updates the underlying 3D scene state for continuous simulation. Existing egocentric simulators either lack explicit 3D grounding, causing structural drift under viewpoint changes, or treat the scene as static, failing to update world states across multi-stage interactions. EgoSim addresses both limitations by modeling 3D scenes as updatable world states. We generate embodiment interactions via a Geometry-action-aware Observation Simulation model, with spatial consistency from an Interaction-aware State Updating module. To overcome the critical data bottleneck posed by the difficulty in acquiring densely aligned scene-interaction training pairs, we design a scalable pipeline that extracts static point clouds, camera trajectories, and embodiment actions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.