Policy Distillation with Selective Input Gradient Regularization for   Efficient Interpretability

Jinwei Xing; Takashi Nagata; Xinyun Zou; Emre Neftci; Jeffrey L.; Krichmar

arXiv:2205.08685·cs.LG·May 19, 2022

Policy Distillation with Selective Input Gradient Regularization for Efficient Interpretability

Jinwei Xing, Takashi Nagata, Xinyun Zou, Emre Neftci, Jeffrey L., Krichmar

PDF

Open Access

TL;DR

This paper introduces DIGR, a method combining policy distillation and input gradient regularization to produce interpretable, efficient saliency maps for RL policies, also enhancing robustness against adversarial attacks.

Contribution

The paper proposes a novel approach, DIGR, that improves interpretability and efficiency of saliency maps in RL, while also increasing policy robustness.

Findings

01

DIGR produces high-quality, interpretable saliency maps in real-time.

02

The approach enhances RL policy robustness to adversarial attacks.

03

Experiments on MiniGrid, Atari, and CARLA validate effectiveness.

Abstract

Although deep Reinforcement Learning (RL) has proven successful in a wide range of tasks, one challenge it faces is interpretability when applied to real-world problems. Saliency maps are frequently used to provide interpretability for deep neural networks. However, in the RL domain, existing saliency map approaches are either computationally expensive and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable saliency maps for RL policies. In this work, we propose an approach of Distillation with selective Input Gradient Regularization (DIGR) which uses policy distillation and input gradient regularization to produce new policies that achieve both high interpretability and computation efficiency in generating saliency maps. Our approach is also found to improve the robustness of RL policies to multiple adversarial attacks. We conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator