Zero-Shot Goal Recognition with Large Language Models
Kin Max Piamolini Gusm\~ao, Nathan Gavenski, Nir Oren, Felipe Meneguzzi

TL;DR
This paper systematically evaluates large language models' ability to perform zero-shot goal recognition on classical planning benchmarks, revealing uneven competence and highlighting goal recognition as a key benchmark for LLM planning knowledge.
Contribution
It provides the first systematic zero-shot evaluation of LLMs as goal recognisers on classical benchmarks, analyzing their evidence integration capabilities.
Findings
Some models approach landmark-based accuracy with full evidence
Other models rely on world-knowledge priors regardless of evidence
Divergence in reasoning reflects evidence integration differences
Abstract
Large language models have recently reached near-parity with classical planners on well-known planning domains, yet this competence relies on world-knowledge exploitation rather than genuine symbolic reasoning. Goal recognition is a complementary abductive task structurally better suited to LLM strengths: it consists of evaluating consistency with world knowledge rather than generating novel action sequences. This paper provides the first systematic zero-shot evaluation of frontier LLMs as goal recognisers on key classical PDDL benchmarks. Our results show that LLM competence on goal recognition is uneven: some models scale with evidence and approach landmark-based accuracy at full observations, while others remain anchored to world-knowledge priors regardless of how much evidence accumulates. Qualitative analysis of model reasoning traces reveals that this divergence reflects a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
