Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning
Hector Kohler, Quentin Delfosse, Riad Akrour, Kristian Kersting,, Philippe Preux

TL;DR
This paper introduces INTERPRETER, a fast distillation method that creates interpretable, editable tree policies for reinforcement learning, enabling better understanding and correction of agent behaviors in various tasks.
Contribution
The paper presents a novel, efficient distillation approach for generating interpretable and editable tree policies in reinforcement learning, addressing limitations of prior methods.
Findings
Tree policies match oracles across multiple tasks
Policies can be interpreted and edited to fix misalignments
Effective in explaining real-world strategies
Abstract
Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tRee Programs for ReinforcEmenT lEaRning. We empirically demonstrate that INTERPRETER compact tree programs match oracles across a diverse set of sequential decision tasks and evaluate the impact of our design choices on interpretability and performances. We show that our policies can be interpreted and edited to correct misalignments on Atari games and to explain real farming strategies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
MethodsSparse Evolutionary Training
