World2Mind: Cognition Toolkit for Allocentric Spatial Reasoning in Foundation Models
Shouwei Ruan, Bin Wang, Zhenyu Wu, Qihui Zhu, Yuxiang Zhang, Hang Su, Yubin Wang

TL;DR
World2Mind introduces a spatial reasoning toolkit that constructs structured 3D cognitive maps, enabling foundation models to perform robust allocentric spatial reasoning without additional training, significantly improving their accuracy and generalization.
Contribution
It presents a training-free spatial reasoning framework using 3D reconstruction and structured maps, enhancing foundation models' spatial understanding capabilities.
Findings
Boosts GPT-5.2 performance by 5-18%.
Purely text-based models approach multimodal model performance in spatial reasoning.
Enables models to perform complex 3D reasoning using structured spatial maps.
Abstract
Achieving robust spatial reasoning remains a fundamental challenge for current Multimodal Foundation Models (MFMs). Existing methods either overfit statistical shortcuts via 3D grounding data or remain confined to 2D visual perception, limiting both spatial reasoning accuracy and generalization in unseen scenarios. Inspired by the spatial cognitive mapping mechanisms of biological intelligence, we propose World2Mind, a training-free spatial intelligence toolkit. At its core, World2Mind leverages 3D reconstruction and instance segmentation models to construct structured spatial cognitive maps, empowering MFMs to proactively acquire targeted spatial knowledge regarding interested landmarks and routes of interest. To provide robust geometric-topological priors, World2Mind synthesizes an Allocentric-Spatial Tree (AST) that uses elliptical parameters to model the top-down layout of landmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Spatial Cognition and Navigation · Multimodal Machine Learning Applications
