Neural-to-Tree Policy Distillation with Policy Improvement Criterion

Zhao-Hua Li; Yang Yu; Yingfeng Chen; Ke Chen; Zhipeng Hu; Changjie Fan

arXiv:2108.06898·cs.LG·August 17, 2021·1 cites

Neural-to-Tree Policy Distillation with Policy Improvement Criterion

Zhao-Hua Li, Yang Yu, Yingfeng Chen, Ke Chen, Zhipeng Hu, Changjie Fan

PDF

Open Access

TL;DR

This paper introduces a novel policy distillation method that improves interpretability and performance in reinforcement learning by maximizing an advantage evaluation, reducing data shift issues associated with behavior cloning.

Contribution

It proposes a new distillation objective based on advantage evaluation, enhancing policy fidelity and interpretability in reinforcement learning.

Findings

01

Outperforms behavior cloning in preserving cumulative reward

02

Produces more consistent and robust policies

03

Generates interpretable decision trees with reasonable rules

Abstract

While deep reinforcement learning has achieved promising results in challenging decision-making tasks, the main bones of its success --- deep neural networks are mostly black-boxes. A feasible way to gain insight into a black-box model is to distill it into an interpretable model such as a decision tree, which consists of if-then rules and is easy to grasp and be verified. However, the traditional model distillation is usually a supervised learning task under a stationary data distribution assumption, which is violated in reinforcement learning. Therefore, a typical policy distillation that clones model behaviors with even a small error could bring a data distribution shift, resulting in an unsatisfied distilled policy model with low fidelity or low performance. In this paper, we propose to address this issue by changing the distillation objective from behavior cloning to maximizing an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning