Learning of Generalizable and Interpretable Knowledge in Grid-Based Reinforcement Learning Environments
Manuel Eberhardinger, Johannes Maucher, Setareh Maghsudi

TL;DR
This paper introduces a method using program synthesis to interpret and understand deep reinforcement learning agents in grid-based environments, enhancing interpretability and verification of learned behaviors.
Contribution
It adapts DreamCoder for learning interpretable concepts in grid environments, enabling analysis of agent behavior through generated programs and decision visualization.
Findings
Programs reveal learned concepts and behaviors.
Visualization aids understanding of agent decision-making.
Different synthesis methods impact interpretability.
Abstract
Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Machine Learning and Data Classification
