Enhancing Interpretability in Deep Reinforcement Learning through Semantic Clustering

Liang Zhang; Justin Lieffers; Adarsh Pyarelal

arXiv:2409.17411·cs.AI·October 27, 2025

Enhancing Interpretability in Deep Reinforcement Learning through Semantic Clustering

Liang Zhang, Justin Lieffers, Adarsh Pyarelal

PDF

Open Access 1 Models 1 Video

TL;DR

This paper introduces a semantic clustering module for deep reinforcement learning that enhances interpretability by revealing semantic organization within the model's internal representations, without extensive manual annotation.

Contribution

It proposes a novel semantic clustering module integrated into DRL, improving interpretability and providing new analytical methods for understanding policy hierarchy and semantic structure.

Findings

01

Semantic clustering reveals internal organization of DRL models

02

The module improves interpretability without manual annotation

03

Experimental validation confirms effectiveness of the approach

Abstract

In this paper, we explore semantic clustering properties of deep reinforcement learning (DRL) to improve its interpretability and deepen our understanding of its internal semantic organization. In this context, semantic clustering refers to the ability of neural networks to cluster inputs based on their semantic similarity in the feature space. We propose a DRL architecture that incorporates a novel semantic clustering module that combines feature dimensionality reduction with online clustering. This module integrates seamlessly into the DRL training pipeline, addressing the instability of t-SNE and eliminating the need for extensive manual annotation inherent to prior semantic analysis methods. We experimentally validate the effectiveness of the proposed module and demonstrate its ability to reveal semantic clustering properties within DRL. Furthermore, we introduce new analytical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
leonepson/semantic_rl
model· ♡ 1
♡ 1

Videos

Enhancing Interpretability in Deep Reinforcement Learning through Semantic Clustering· slideslive

Taxonomy

TopicsArtificial Intelligence in Games · Digital Games and Media · Gambling Behavior and Treatments