Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning
Senne Deproost, Dennis Steckelmacher, Ann Now\'e

TL;DR
This paper introduces a method to create explainable reinforcement learning policies by partitioning the state space with Voronoi diagrams and distilling the policy into locally linear models, balancing interpretability and performance.
Contribution
The paper presents a novel model-agnostic approach using Voronoi partitioning to distill deep RL policies into locally linear, human-readable models, improving transparency without sacrificing performance.
Findings
Distilled policies are explainable and match original performance.
Voronoi-based partitioning effectively captures local dynamics.
Distillation can outperform the original black-box policy in some cases.
Abstract
Deep Reinforcement Learning is one of the state-of-the-art methods for producing near-optimal system controllers. However, deep RL algorithms train a deep neural network, that lacks transparency, which poses challenges when the controller has to meet regulations, or foster trust. To alleviate this, one could transfer the learned behaviour into a model that is human-readable by design using knowledge distilla- tion. Often this is done with a single model which mimics the original model on average but could struggle in more dynamic situations. A key challenge is that this simpler model should have the right balance be- tween flexibility and complexity or right balance between balance bias and accuracy. We propose a new model-agnostic method to divide the state space into regions where a simplified, human-understandable model can operate in. In this paper, we use Voronoi partitioning to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
