LLMs for Text-Based Exploration and Navigation Under Partial Observability
Stephan Sandfuchs, Maximilian Melchert, J\"org Frochte

TL;DR
This paper evaluates large language models as text-only controllers for exploration and navigation in unknown environments with partial observability, proposing a benchmark and analyzing their capabilities and limitations.
Contribution
It introduces a reproducible benchmark for LLM-based navigation, compares various models, and highlights the potential of reasoning-tuned models with few-shot prompts for partial map control.
Findings
Reasoning-tuned models reliably complete navigation tasks.
Few-shot prompts improve model performance and efficiency.
Training and test-time strategies better predict control ability than parameter size.
Abstract
Exploration and goal-directed navigation in unknown layouts are central to inspection, logistics, and search-and-rescue. We ask whether large language models (LLMs) can function as \emph{text-only} controllers under partial observability -- without code execution, tools, or program synthesis. We introduce a reproducible benchmark with oracle localisation in fixed ASCII gridworlds: each step reveals only a local window around the agent and the model must select one of \texttt{UP/RIGHT/DOWN/LEFT}. Nine contemporary LLMs ranging from open/proprietary, dense / Mixture of Experts and instruction- vs. reasoning-tuned are evaluated on two tasks across three layouts of increasing difficulty: \emph{Exploration} (maximising revealed cells) and \emph{Navigation} (reach the goal on the shortest path). The experimental results are evaluated on quantitative metrics including \emph{success…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
