BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions
Xiao Liu, Jie Zhao, Wubing Chen, Mao Tan, Yongxing Su

TL;DR
This paper introduces BET, a novel interpretable model that identifies error-prone states in deep reinforcement learning agents by analyzing decision consistency and state neighborhoods, improving explanation fidelity especially in complex environments.
Contribution
BET is a new self-interpretable structure that pinpoints error-prone states in DRL by modeling state neighborhoods and decision uniformity, advancing explainability in complex scenarios.
Findings
BET outperforms existing models in explanation fidelity.
BET effectively identifies error-prone states in various RL environments.
First to explain complex multi-agent scenarios like StarCraft II transparently.
Abstract
Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue, we propose a novel self-interpretable structure, named Backbone Extract Tree (BET), to better explain the agent's behavior by identify the error-prone states. At a high level, BET hypothesizes that states in which the agent consistently executes uniform decisions exhibit a reduced propensity for errors. To effectively model this phenomenon, BET expresses these states within neighborhoods, each defined by a curated set of representative states. Therefore, states positioned at a greater distance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Adversarial Robustness in Machine Learning
MethodsSparse Evolutionary Training · Focus
