Designing Interpretable Approximations to Deep Reinforcement Learning

Nathan Dahlin; Krishna Chaitanya Kalagarla; Nikhil Naik; Rahul Jain,; Pierluigi Nuzzo

arXiv:2010.14785·cs.LG·June 22, 2021·1 cites

Designing Interpretable Approximations to Deep Reinforcement Learning

Nathan Dahlin, Krishna Chaitanya Kalagarla, Nikhil Naik, Rahul Jain,, Pierluigi Nuzzo

PDF

Open Access

TL;DR

This paper explores creating simpler, interpretable models that approximate deep reinforcement learning systems, aiming to balance performance with explainability and efficiency.

Contribution

It introduces methods for designing reduced models that maintain performance while providing interpretability, demonstrated on decision trees and kernel machines in reinforcement learning.

Findings

01

Reduced models can preserve key performance metrics.

02

Interpretable models explain latent knowledge effectively.

03

Approach is validated on benchmark RL tasks.

Abstract

In an ever expanding set of research and application areas, deep neural networks (DNNs) set the bar for algorithm performance. However, depending upon additional constraints such as processing power and execution time limits, or requirements such as verifiable safety guarantees, it may not be feasible to actually use such high-performing DNNs in practice. Many techniques have been developed in recent years to compress or distill complex DNNs into smaller, faster or more understandable models and controllers. This work seeks to identify reduced models that not only preserve a desired performance level, but also, for example, succinctly explain the latent knowledge represented by a DNN. We illustrate the effectiveness of the proposed approach on the evaluation of decision tree variants and kernel machines in the context of benchmark reinforcement learning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)