Neural-to-Tree Policy Distillation with Policy Improvement Criterion
Zhao-Hua Li, Yang Yu, Yingfeng Chen, Ke Chen, Zhipeng Hu, Changjie Fan

TL;DR
This paper introduces a novel policy distillation method that improves interpretability and performance in reinforcement learning by maximizing an advantage evaluation, reducing data shift issues associated with behavior cloning.
Contribution
It proposes a new distillation objective based on advantage evaluation, enhancing policy fidelity and interpretability in reinforcement learning.
Findings
Outperforms behavior cloning in preserving cumulative reward
Produces more consistent and robust policies
Generates interpretable decision trees with reasonable rules
Abstract
While deep reinforcement learning has achieved promising results in challenging decision-making tasks, the main bones of its success --- deep neural networks are mostly black-boxes. A feasible way to gain insight into a black-box model is to distill it into an interpretable model such as a decision tree, which consists of if-then rules and is easy to grasp and be verified. However, the traditional model distillation is usually a supervised learning task under a stationary data distribution assumption, which is violated in reinforcement learning. Therefore, a typical policy distillation that clones model behaviors with even a small error could bring a data distribution shift, resulting in an unsatisfied distilled policy model with low fidelity or low performance. In this paper, we propose to address this issue by changing the distillation objective from behavior cloning to maximizing an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
