Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning
Qinchuan Cheng, Zhantao Gong, Pengzhan Sun, Angela Yao, Xulei Yang, and Shijie Li

TL;DR
Ego2World creates a benchmark converting egocentric cooking videos into symbolic worlds for testing belief-state planning in embodied agents under partial observation.
Contribution
It introduces Ego2World, a novel benchmark that derives executable symbolic worlds from egocentric videos for evaluating belief-based planning.
Findings
Action-overlap scores overestimate success.
Persistent belief memory improves task completion.
Belief maintenance is crucial for embodied-agent evaluation.
Abstract
Embodied agents in household environments must plan under partial observation: they need to remember objects, track state changes, and recover when actions fail. Existing benchmarks only partially test this ability. Egocentric video datasets capture realistic human activities but remain passive, while interactive simulators support execution but rely on synthetic scenes and hand-crafted dynamics, introducing a sim-to-real gap and often assuming fully observable state. We introduce Ego2World, an executable benchmark that turns egocentric cooking videos into executable symbolic worlds governed by graph-transition rules. Built on HD-EPIC, Ego2World derives reusable transition rules from video annotations and executes them in a hidden symbolic world graph. During evaluation, the simulator maintains the hidden world graph, while the agent plans over its own partial belief graph using only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
