Counterfactual Explanations as Plans

Vaishak Belle (University of Edinburgh)

arXiv:2502.09205·cs.AI·February 14, 2025·ICLP

Counterfactual Explanations as Plans

Vaishak Belle (University of Edinburgh)

PDF

TL;DR

This paper formalizes counterfactual explanations as action sequences within AI planning, providing a framework for model reconciliation and understanding agent beliefs in various informational settings.

Contribution

It introduces a formal account of counterfactual explanations as plans using situation calculus, extending to different belief and knowledge scenarios.

Findings

01

Counterfactual explanations can be modeled as action sequences.

02

The framework generalizes to agents with partial, weakened, or false beliefs.

03

Model reconciliation can be achieved through suggested actions or model corrections.

Abstract

There has been considerable recent interest in explainability in AI, especially with black-box machine learning models. As correctly observed by the planning community, when the application at hand is not a single-shot decision or prediction, but a sequence of actions that depend on observations, a richer notion of explanations are desirable. In this paper, we look to provide a formal account of ``counterfactual explanations," based in terms of action sequences. We then show that this naturally leads to an account of model reconciliation, which might take the form of the user correcting the agent's model, or suggesting actions to the agent's plan. For this, we will need to articulate what is true versus what is known, and we appeal to a modal fragment of the situation calculus to formalise these intuitions. We consider various settings: the agent knowing partial truths, weakened…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.