3D-Belief: Embodied Belief Inference via Generative 3D World Modeling
Yifan Yin, Zehao Wen, Jieneng Chen, Zehan Zheng, Nanru Dai, Haojun Shi, Suyu Ye, Aydan Huang, Zheyuan Zhang, Alan Yuille, Jianwen Xie, Ayush Tewari, Tianmin Shu

TL;DR
This paper introduces 3D-Belief, a generative 3D world model that maintains and updates an agent's beliefs about unobserved environments, enhancing embodied reasoning and navigation under partial observability.
Contribution
The work presents a novel 3D belief inference framework that explicitly models uncertainty and updates beliefs online, improving scene understanding and task performance in embodied agents.
Findings
3D-Belief improves scene memory and imagination quality.
It enhances object navigation in simulation and real-world environments.
Outperforms state-of-the-art methods in 3D belief inference and embodied tasks.
Abstract
Recent advances in visual generative models have highlighted the promise of learning generative world models. However, most existing approaches frame world modeling as novel-view synthesis or future-frame prediction, emphasizing visual realism rather than the structured uncertainty required by embodied agents acting under partial observability. In this work, we propose a different perspective: world modeling as embodied belief inference in 3D space. From this view, a world model should not merely render what may be seen, but maintain and update an agent's belief about the unobserved 3D world as new observations are acquired. We identify several key capabilities for such models, including spatially consistent scene memory, multi-hypothesis belief sampling, sequential belief updating, and semantically informed prediction of unseen regions. We instantiate these ideas in 3D-Belief, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
