Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents
Yael Septon, Tobias Huber, Elisabeth Andr\'e, Ofra Amir

TL;DR
This paper proposes a novel approach combining local reward decomposition and global highlights to improve understanding of reinforcement learning agents, validated through user studies showing enhanced interpretability.
Contribution
It introduces a combined explanation framework for RL agents that integrates reward decomposition with global behavior summaries, a novel approach in the field.
Findings
Reward decomposition helps identify agent priorities.
Global highlights improve understanding when preferences are similar.
Combined explanations enhance interpretability in RL agents.
Abstract
Explaining the behavior of reinforcement learning agents operating in sequential decision-making settings is challenging, as their behavior is affected by a dynamic environment and delayed rewards. Methods that help users understand the behavior of such agents can roughly be divided into local explanations that analyze specific decisions of the agents and global explanations that convey the general strategy of the agents. In this work, we study a novel combination of local and global explanations for reinforcement learning agents. Specifically, we combine reward decomposition, a local explanation method that exposes which components of the reward function influenced a specific decision, and HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in decisive states. We conducted two user studies to evaluate the integration of these explanation methods and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques
