Fidelity-Induced Interpretable Policy Extraction for Reinforcement Learning
Xiao Liu, Wubing Chen, Mao Tan

TL;DR
This paper introduces FIPE, a novel method for extracting interpretable policies in deep reinforcement learning that emphasizes fidelity and consistency, improving explanation reliability in complex environments like StarCraft II.
Contribution
FIPE integrates fidelity measurement into policy extraction, addressing inconsistency issues and enhancing interpretability and performance in complex RL tasks.
Findings
FIPE outperforms baselines in interaction performance.
FIPE achieves higher consistency in explanations.
FIPE is effective in complex environments like StarCraft II.
Abstract
Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making problems. However, existing DRL agents make decisions in an opaque fashion, hindering the user from establishing trust and scrutinizing weaknesses of the agents. While recent research has developed Interpretable Policy Extraction (IPE) methods for explaining how an agent takes actions, their explanations are often inconsistent with the agent's behavior and thus, frequently fail to explain. To tackle this issue, we propose a novel method, Fidelity-Induced Policy Extraction (FIPE). Specifically, we start by analyzing the optimization mechanism of existing IPE methods, elaborating on the issue of ignoring consistency while increasing cumulative rewards. We then design a fidelity-induced mechanism by integrate a fidelity measurement into the reinforcement learning feedback. We conduct experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
Methodsfail
