Abstract Counterfactuals for Language Model Agents

Edoardo Pona; Milad Kazemi; Yali Du; David Watson; Nicola Paoletti

arXiv:2506.02946·cs.LG·June 4, 2025

Abstract Counterfactuals for Language Model Agents

Edoardo Pona, Milad Kazemi, Yali Du, David Watson, Nicola Paoletti

PDF

Open Access 1 Video

TL;DR

This paper introduces Abstract Counterfactuals, a framework for high-level counterfactual reasoning in language model agents, addressing limitations of token-level methods by focusing on meaningful, environment-relevant features.

Contribution

The paper presents a novel framework for counterfactual inference in LM agents that emphasizes high-level features, improving interpretability and relevance over token-level approaches.

Findings

01

Produces consistent and meaningful counterfactuals

02

Reduces side effects compared to token-level methods

03

Effective in text-based games and counterfactual text generation

Abstract

Counterfactual inference is a powerful tool for analysing and evaluating autonomous agents, but its application to language model (LM) agents remains challenging. Existing work on counterfactuals in LMs has primarily focused on token-level counterfactuals, which are often inadequate for LM agents due to their open-ended action spaces. Unlike traditional agents with fixed, clearly defined action spaces, the actions of LM agents are often implicit in the strings they output, making their action spaces difficult to define and interpret. Furthermore, the meanings of individual tokens can shift depending on the context, adding complexity to token-level reasoning and sometimes leading to biased or meaningless counterfactuals. We introduce \emph{Abstract Counterfactuals}, a framework that emphasises high-level characteristics of actions and interactions within an environment, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Abstract Counterfactuals for Language Model Agents· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies