TL;DR
This paper introduces an explainable reinforcement learning framework that extracts interestingness elements from agent-environment interactions, enabling visual summaries to help humans understand agent capabilities and limitations effectively.
Contribution
The paper presents a novel XRL framework that uses interaction data to generate visual summaries, improving interpretability of RL agents' strengths and weaknesses.
Findings
Diversity of interestingness elements enhances human understanding of agents.
Visual summaries effectively communicate agent capabilities and limitations.
Framework leverages data from standard RL algorithms and additional easy-to-collect data.
Abstract
We propose an explainable reinforcement learning (XRL) framework that analyzes an agent's history of interaction with the environment to extract interestingness elements that help explain its behavior. The framework relies on data readily available from standard RL algorithms, augmented with data that can easily be collected by the agent while learning. We describe how to create visual summaries of an agent's behavior in the form of short video-clips highlighting key interaction moments, based on the proposed elements. We also report on a user study where we evaluated the ability of humans to correctly perceive the aptitude of agents with different characteristics, including their capabilities and limitations, given visual summaries automatically generated by our framework. The results show that the diversity of aspects captured by the different interestingness elements is crucial to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
