Fidelity-Induced Interpretable Policy Extraction for Reinforcement   Learning

Xiao Liu; Wubing Chen; Mao Tan

arXiv:2309.06097·cs.AI·September 13, 2023

Fidelity-Induced Interpretable Policy Extraction for Reinforcement Learning

Xiao Liu, Wubing Chen, Mao Tan

PDF

Open Access

TL;DR

This paper introduces FIPE, a novel method for extracting interpretable policies in deep reinforcement learning that emphasizes fidelity and consistency, improving explanation reliability in complex environments like StarCraft II.

Contribution

FIPE integrates fidelity measurement into policy extraction, addressing inconsistency issues and enhancing interpretability and performance in complex RL tasks.

Findings

01

FIPE outperforms baselines in interaction performance.

02

FIPE achieves higher consistency in explanations.

03

FIPE is effective in complex environments like StarCraft II.

Abstract

Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making problems. However, existing DRL agents make decisions in an opaque fashion, hindering the user from establishing trust and scrutinizing weaknesses of the agents. While recent research has developed Interpretable Policy Extraction (IPE) methods for explaining how an agent takes actions, their explanations are often inconsistent with the agent's behavior and thus, frequently fail to explain. To tackle this issue, we propose a novel method, Fidelity-Induced Policy Extraction (FIPE). Specifically, we start by analyzing the optimization mechanism of existing IPE methods, elaborating on the issue of ignoring consistency while increasing cumulative rewards. We then design a fidelity-induced mechanism by integrate a fidelity measurement into the reinforcement learning feedback. We conduct experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning

Methodsfail