Enhancing Interpretability in Deep Reinforcement Learning through Semantic Clustering
Liang Zhang, Justin Lieffers, Adarsh Pyarelal

TL;DR
This paper introduces a semantic clustering module for deep reinforcement learning that enhances interpretability by revealing semantic organization within the model's internal representations, without extensive manual annotation.
Contribution
It proposes a novel semantic clustering module integrated into DRL, improving interpretability and providing new analytical methods for understanding policy hierarchy and semantic structure.
Findings
Semantic clustering reveals internal organization of DRL models
The module improves interpretability without manual annotation
Experimental validation confirms effectiveness of the approach
Abstract
In this paper, we explore semantic clustering properties of deep reinforcement learning (DRL) to improve its interpretability and deepen our understanding of its internal semantic organization. In this context, semantic clustering refers to the ability of neural networks to cluster inputs based on their semantic similarity in the feature space. We propose a DRL architecture that incorporates a novel semantic clustering module that combines feature dimensionality reduction with online clustering. This module integrates seamlessly into the DRL training pipeline, addressing the instability of t-SNE and eliminating the need for extensive manual annotation inherent to prior semantic analysis methods. We experimentally validate the effectiveness of the proposed module and demonstrate its ability to reveal semantic clustering properties within DRL. Furthermore, we introduce new analytical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Digital Games and Media · Gambling Behavior and Treatments
