Towards Interpretable Deep Reinforcement Learning Models via Inverse   Reinforcement Learning

Sean Xie; Soroush Vosoughi; Saeed Hassanpour

arXiv:2203.16464·cs.LG·March 4, 2024

Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning

Sean Xie, Soroush Vosoughi, Saeed Hassanpour

PDF

Open Access

TL;DR

This paper introduces a novel framework using Adversarial Inverse Reinforcement Learning to provide global explanations for deep reinforcement learning models, enhancing interpretability by summarizing their decision-making processes.

Contribution

It presents a new approach that leverages inverse reinforcement learning to interpret and explain the behavior of deep reinforcement learning models globally.

Findings

01

Provides global explanations for RL decisions

02

Captures intuitive tendencies of models

03

Enhances interpretability of deep RL models

Abstract

Artificial intelligence, particularly through recent advancements in deep learning, has achieved exceptional performances in many tasks in fields such as natural language processing and computer vision. In addition to desirable evaluation metrics, a high level of interpretability is often required for these models to be reliably utilized. Therefore, explanations that offer insight into the process by which a model maps its inputs onto its outputs are much sought-after. Unfortunately, the current black box nature of machine learning models is still an unresolved issue and this very nature prevents researchers from learning and providing explicative descriptions for a model's behavior and final predictions. In this work, we propose a novel framework utilizing Adversarial Inverse Reinforcement Learning that can provide global explanations for decisions made by a Reinforcement Learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications