The Limits of Predicting Agents from Behaviour

Alexis Bellot; Jonathan Richens; Tom Everitt

arXiv:2506.02923·cs.AI·June 5, 2025

The Limits of Predicting Agents from Behaviour

Alexis Bellot, Jonathan Richens, Tom Everitt

PDF

Open Access

TL;DR

This paper establishes theoretical bounds on predicting an agent's behaviour from its observed actions, highlighting fundamental limitations in inferring beliefs and intentions solely from behaviour data in AI safety and fairness contexts.

Contribution

It derives novel theoretical bounds on predicting agent behaviour in unseen environments based on behavioural data, clarifying the limits of inference from observed actions.

Findings

01

Derived bounds on behaviour prediction in new environments

02

Identified fundamental limitations in inferring beliefs from behaviour

03

Discussed implications for AI safety and fairness

Abstract

As the complexity of AI systems and their interactions with the world increases, generating explanations for their behaviour is important for safely deploying AI. For agents, the most natural abstractions for predicting behaviour attribute beliefs, intentions and goals to the system. If an agent behaves as if it has a certain goal or belief, then we can make reasonable predictions about how it will behave in novel situations, including those where comprehensive safety evaluations are untenable. How well can we infer an agent's beliefs from their behaviour, and how reliably can these inferred beliefs predict the agent's behaviour in novel situations? We provide a precise answer to this question under the assumption that the agent's behaviour is guided by a world model. Our contribution is the derivation of novel bounds on the agent's behaviour in new (unseen) deployment environments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI