Can Large Language Models be Good Path Planners? A Benchmark and   Investigation on Spatial-temporal Reasoning

Mohamed Aghzal; Erion Plaku; Ziyu Yao

arXiv:2310.03249·cs.CL·February 25, 2025·6 cites

Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning

Mohamed Aghzal, Erion Plaku, Ziyu Yao

PDF

Open Access 1 Repo

TL;DR

This paper introduces PPNL, a new benchmark for evaluating large language models' spatial-temporal reasoning in path planning tasks, revealing strengths and limitations of models like GPT-4 and fine-tuned LLMs.

Contribution

The paper presents PPNL, a novel benchmark for spatial-temporal reasoning in path planning, and systematically evaluates LLMs' performance, highlighting their capabilities and challenges.

Findings

01

Few-shot GPT-4 shows promise in spatial reasoning.

02

Fine-tuned LLMs excel in in-distribution tasks but struggle with larger environments.

03

GPT-4 still fails in long-term temporal reasoning.

Abstract

Large language models (LLMs) have achieved remarkable success across a wide spectrum of tasks; however, they still face limitations in scenarios that demand long-term planning and spatial reasoning. To facilitate this line of research, in this work, we propose a new benchmark, termed $P$ ath $P$ lanning from $N$ atural $L$ anguage ( $PPNL$ ). Our benchmark evaluates LLMs' spatial-temporal reasoning by formulating ''path planning'' tasks that require an LLM to navigate to target locations while avoiding obstacles and adhering to constraints. Leveraging this benchmark, we systematically investigate LLMs including GPT-4 via different few-shot prompting methodologies as well as BART and T5 of various sizes via fine-tuning. Our experimental results show the promise of few-shot GPT-4 in spatial reasoning, when it is prompted to reason and act…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mohamedaghzal/llms-as-path-planners
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsGated Linear Unit · Attention Is All You Need · Dropout · Attention Dropout · Dense Connections · Inverse Square Root Schedule · Linear Layer · Label Smoothing · SentencePiece · Adam