Backward explanations via redefinition of predicates
L\'eo Sauli\`eres, Martin C. Cooper, Florence Dupin de Saint Cyr

TL;DR
This paper introduces Backward-HXP, a novel method for explaining reinforcement learning histories by redefining predicates, enabling summaries of long interaction sequences without approximating action importance scores.
Contribution
The paper proposes Backward-HXP, a new approach that avoids score approximation in history explanations, improving the interpretability of long RL interaction sequences.
Findings
B-HXP effectively summarizes long histories.
It avoids computationally expensive score approximations.
Experiments demonstrate improved explanation quality.
Abstract
History eXplanation based on Predicates (HXP), studies the behavior of a Reinforcement Learning (RL) agent in a sequence of agent's interactions with the environment (a history), through the prism of an arbitrary predicate. To this end, an action importance score is computed for each action in the history. The explanation consists in displaying the most important actions to the user. As the calculation of an action's importance is #W[1]-hard, it is necessary for long histories to approximate the scores, at the expense of their quality. We therefore propose a new HXP method, called Backward-HXP, to provide explanations for these histories without having to approximate scores. Experiments show the ability of B-HXP to summarise long histories.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
