Compositional Concept-Based Neuron-Level Interpretability for Deep   Reinforcement Learning

Zeyu Jiang; Hai Huang; Xingquan Zuo

arXiv:2502.00684·cs.LG·February 4, 2025

Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning

Zeyu Jiang, Hai Huang, Xingquan Zuo

PDF

Open Access

TL;DR

This paper introduces a neuron-level interpretability method for deep reinforcement learning models, using concept-based explanations to improve transparency and align neural activations with human-understandable concepts.

Contribution

It presents a novel approach to interpret DRL networks by formalizing atomic concepts and analyzing neuron activations, enhancing understanding of internal decision mechanisms.

Findings

01

Effectively identifies meaningful concepts in DRL models

02

Aligns neuron activations with human-understandable concepts

03

Provides faithful explanations of network decision-making

Abstract

Deep reinforcement learning (DRL), through learning policies or values represented by neural networks, has successfully addressed many complex control problems. However, the neural networks introduced by DRL lack interpretability and transparency. Current DRL interpretability methods largely treat neural networks as black boxes, with few approaches delving into the internal mechanisms of policy/value networks. This limitation undermines trust in both the neural network models that represent policies and the explanations derived from them. In this work, we propose a novel concept-based interpretability method that provides fine-grained explanations of DRL models at the neuron level. Our method formalizes atomic concepts as binary functions over the state space and constructs complex concepts through logical operations. By analyzing the correspondence between neuron activations and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Fuzzy Logic and Control Systems · Adversarial Robustness in Machine Learning

MethodsALIGN