LLMs for Text-Based Exploration and Navigation Under Partial Observability

Stephan Sandfuchs; Maximilian Melchert; J\"org Frochte

arXiv:2604.09604·cs.AI·April 14, 2026

LLMs for Text-Based Exploration and Navigation Under Partial Observability

Stephan Sandfuchs, Maximilian Melchert, J\"org Frochte

PDF

TL;DR

This paper evaluates large language models as text-only controllers for exploration and navigation in unknown environments with partial observability, proposing a benchmark and analyzing their capabilities and limitations.

Contribution

It introduces a reproducible benchmark for LLM-based navigation, compares various models, and highlights the potential of reasoning-tuned models with few-shot prompts for partial map control.

Findings

01

Reasoning-tuned models reliably complete navigation tasks.

02

Few-shot prompts improve model performance and efficiency.

03

Training and test-time strategies better predict control ability than parameter size.

Abstract

Exploration and goal-directed navigation in unknown layouts are central to inspection, logistics, and search-and-rescue. We ask whether large language models (LLMs) can function as \emph{text-only} controllers under partial observability -- without code execution, tools, or program synthesis. We introduce a reproducible benchmark with oracle localisation in fixed ASCII gridworlds: each step reveals only a local $5 \times 5$ window around the agent and the model must select one of \texttt{UP/RIGHT/DOWN/LEFT}. Nine contemporary LLMs ranging from open/proprietary, dense / Mixture of Experts and instruction- vs. reasoning-tuned are evaluated on two tasks across three layouts of increasing difficulty: \emph{Exploration} (maximising revealed cells) and \emph{Navigation} (reach the goal on the shortest path). The experimental results are evaluated on quantitative metrics including \emph{success…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.