Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Wenshan Wu, Shaoguang Mao, Yadong Zhang, Yan Xia, Li Dong, Lei Cui,, Furu Wei

TL;DR
This paper introduces Visualization-of-Thought (VoT), a prompting method that visualizes reasoning traces in large language models to improve their spatial reasoning abilities across various tasks.
Contribution
The paper proposes VoT, a novel prompting technique that enhances LLMs' spatial reasoning by visualizing their reasoning process, inspired by human mental imagery.
Findings
VoT significantly improves LLMs' performance on spatial reasoning tasks.
VoT outperforms existing multimodal models in navigation and grid tasks.
The approach suggests potential for mental image generation in LLMs.
Abstract
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks. However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored. Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind's Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. VoT aims to elicit spatial reasoning of LLMs by visualizing their reasoning traces, thereby guiding subsequent reasoning steps. We employed VoT for multi-hop spatial reasoning tasks, including natural language navigation, visual navigation, and visual tiling in 2D grid worlds. Experimental results demonstrated that VoT significantly enhances the spatial reasoning abilities of LLMs. Notably,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Geographic Information Systems Studies · Constraint Satisfaction and Optimization
