Mind over Space: Can Multimodal Large Language Models Mentally Navigate?
Qihui Zhu, Shouwei Ruan, Xiao Yang, Hao Jiang, Yao Huang, Shiji Zhao, Hanwei Fan, Hang Su, Xingxing Wei

TL;DR
This paper introduces Video2Mental, a benchmark for evaluating mental navigation in multimodal large language models, and proposes NavMind, a reasoning model that significantly improves spatial planning and navigation over existing models.
Contribution
The paper presents a new benchmark for mental navigation in MLLMs and introduces NavMind, a novel reasoning model with explicit cognitive maps that enhances spatial reasoning and planning.
Findings
Standard pre-training does not induce mental navigation capabilities.
NavMind outperforms existing models in structured spatial representation.
NavMind maintains higher planning accuracy over extended horizons.
Abstract
Despite the widespread adoption of MLLMs in embodied agents, their capabilities remain largely confined to reactive planning from immediate observations, consistently failing in spatial reasoning across extensive spatiotemporal scales. Cognitive science reveals that Biological Intelligence (BI) thrives on "mental navigation": the strategic construction of spatial representations from experience and the subsequent mental simulation of paths prior to action. To bridge the gap between AI and BI, we introduce Video2Mental, a pioneering benchmark for evaluating the mental navigation capabilities of MLLMs. The task requires constructing hierarchical cognitive maps from long egocentric videos and generating landmark-based path plans step by step, with planning accuracy verified through simulator-based physical interaction. Our benchmarking results reveal that mental navigation capability does…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial Cognition and Navigation · Memory and Neural Mechanisms · Constraint Satisfaction and Optimization
