Interpreting Neural Policies with Disentangled Tree Representations
Tsun-Hsuan Wang, Wei Xiao, Tim Seyde, Ramin Hasani, Daniela Rus

TL;DR
This paper explores how disentangled representations and decision trees can improve the interpretability of neural policies in robot learning, providing metrics and analysis to understand decision factors.
Contribution
It introduces a novel approach combining disentangled representations with decision trees to enhance interpretability of neural policies in robotics.
Findings
Disentangled representations correlate with clearer decision factors.
Interpretability metrics effectively measure neural network understanding.
Experimental results confirm the link between disentanglement and interpretability.
Abstract
The advancement of robots, particularly those functioning in complex human-centric environments, relies on control solutions that are driven by machine learning. Understanding how learning-based controllers make decisions is crucial since robots are often safety-critical systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability of robot learning. In this paper, we aim to study interpretability of compact neural policies through the lens of disentangled representation. We leverage decision trees to obtain factors of variation [1] for disentanglement in robot learning; these encapsulate skills, behaviors, or strategies toward solving tasks. To assess how well networks uncover the underlying task dynamics, we introduce interpretability metrics that measure disentanglement of learned neural dynamics from a concentration of decisions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
