A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

Raghu Arghal; Fade Chen; Niall Dalton; Evgenii Kortukov; Calum McNamara; Angelos Nalmpantis; Moksh Nirvaan; Gabriele Sarti; Mario Giulianelli

arXiv:2602.08964·cs.LG·February 10, 2026

A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

Raghu Arghal, Fade Chen, Niall Dalton, Evgenii Kortukov, Calum McNamara, Angelos Nalmpantis, Moksh Nirvaan, Gabriele Sarti, Mario Giulianelli

PDF

Open Access

TL;DR

This paper introduces a framework combining behavioral testing and interpretability analysis to evaluate goal-directedness in language model agents, demonstrated through a 2D grid navigation task.

Contribution

It presents a novel methodology for assessing goal-directed behavior by integrating performance metrics with internal representation decoding.

Findings

01

Performance scales with task difficulty but remains robust.

02

Agents encode a spatial map of the environment internally.

03

Actions align with internal environmental representations.

Abstract

Understanding an agent's goals helps explain and predict its behaviour, yet there is no established methodology for reliably attributing goals to agentic systems. We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models' internal representations. As a case study, we examine an LLM agent navigating a 2D grid world toward a goal state. Behaviourally, we evaluate the agent against an optimal policy across varying grid sizes, obstacle densities, and goal structures, finding that performance scales with task difficulty while remaining robust to difficulty-preserving transformations and complex goal structures. We then use probing methods to decode the agent's internal representations of the environment state and its multi-step action plans. We find that the LLM agent non-linearly encodes a coarse spatial map…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Multi-Agent Systems and Negotiation · Language and cultural evolution