Interpretable Learning Dynamics in Unsupervised Reinforcement Learning

Shashwat Pandey

arXiv:2505.06279·cs.LG·May 13, 2025

Interpretable Learning Dynamics in Unsupervised Reinforcement Learning

Shashwat Pandey

PDF

Open Access

TL;DR

This paper introduces an interpretability framework for unsupervised reinforcement learning agents, analyzing how intrinsic motivation influences attention, behavior, and internal representations, with new metrics for attention dynamics.

Contribution

It proposes a novel interpretability framework and metrics to analyze internal dynamics of unsupervised RL agents, highlighting the effects of curiosity-driven exploration and architecture.

Findings

01

Curiosity-driven agents exhibit broader, more dynamic attention.

02

Transformer-RND combines wide attention with structured representations.

03

Architectural biases and training signals significantly influence agent behavior.

Abstract

We present an interpretability framework for unsupervised reinforcement learning (URL) agents, aimed at understanding how intrinsic motivation shapes attention, behavior, and representation learning. We analyze five agents DQN, RND, ICM, PPO, and a Transformer-RND variant trained on procedurally generated environments, using Grad-CAM, Layer-wise Relevance Propagation (LRP), exploration metrics, and latent space clustering. To capture how agents perceive and adapt over time, we introduce two metrics: attention diversity, which measures the spatial breadth of focus, and attention change rate, which quantifies temporal shifts in attention. Our findings show that curiosity-driven agents display broader, more dynamic attention and exploratory behavior than their extrinsically motivated counterparts. Among them, TransformerRND combines wide attention, high exploration coverage, and compact,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Embodied and Extended Cognition · Reinforcement Learning in Robotics

MethodsSoftmax · Attention Is All You Need · Convolution · Q-Learning · Dense Connections · Entropy Regularization · Proximal Policy Optimization · Deep Q-Network