Zero-Shot Goal Recognition with Large Language Models

Kin Max Piamolini Gusm\~ao; Nathan Gavenski; Nir Oren; Felipe Meneguzzi

arXiv:2605.15333·cs.AI·May 18, 2026

Zero-Shot Goal Recognition with Large Language Models

Kin Max Piamolini Gusm\~ao, Nathan Gavenski, Nir Oren, Felipe Meneguzzi

PDF

TL;DR

This paper systematically evaluates large language models' ability to perform zero-shot goal recognition on classical planning benchmarks, revealing uneven competence and highlighting goal recognition as a key benchmark for LLM planning knowledge.

Contribution

It provides the first systematic zero-shot evaluation of LLMs as goal recognisers on classical benchmarks, analyzing their evidence integration capabilities.

Findings

01

Some models approach landmark-based accuracy with full evidence

02

Other models rely on world-knowledge priors regardless of evidence

03

Divergence in reasoning reflects evidence integration differences

Abstract

Large language models have recently reached near-parity with classical planners on well-known planning domains, yet this competence relies on world-knowledge exploitation rather than genuine symbolic reasoning. Goal recognition is a complementary abductive task structurally better suited to LLM strengths: it consists of evaluating consistency with world knowledge rather than generating novel action sequences. This paper provides the first systematic zero-shot evaluation of frontier LLMs as goal recognisers on key classical PDDL benchmarks. Our results show that LLM competence on goal recognition is uneven: some models scale with evidence and approach landmark-based accuracy at full observations, while others remain anchored to world-knowledge priors regardless of how much evidence accumulates. Qualitative analysis of model reasoning traces reveals that this divergence reflects a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.