From Perception to Action: Spatial AI Agents and World Models
Gloria Felicia, Nolan Bryant, Handi Putra, Ayaan Gazali, Eliel Lobo, Esteban Rojas

TL;DR
This paper introduces a unified framework connecting agentic capabilities with spatial reasoning, emphasizing the importance of world models and hierarchical memory for embodied AI in physical environments.
Contribution
It provides a comprehensive taxonomy linking spatial perception, reasoning, and action, bridging gaps between existing isolated surveys and proposing directions for future research.
Findings
Hierarchical memory systems enhance long-horizon spatial tasks
GNN-LLM integration improves structured spatial reasoning
World models are crucial for safe deployment across scales
Abstract
While large language models have become the prevailing approach for agentic reasoning and planning, their success in symbolic domains does not readily translate to the physical world. Spatial intelligence, the ability to perceive 3D structure, reason about object relationships, and act under physical constraints, is an orthogonal capability that proves important for embodied agents. Existing surveys address either agentic architectures or spatial domains in isolation. None provide a unified framework connecting these complementary capabilities. This paper bridges that gap. Through a thorough review of over 2,000 papers, citing 742 works from top-tier venues, we introduce a unified three-axis taxonomy connecting agentic capabilities with spatial tasks across scales. Crucially, we distinguish spatial grounding (metric understanding of geometry and physics) from symbolic grounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial Cognition and Navigation · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization
