Interpretable Learning Dynamics in Unsupervised Reinforcement Learning
Shashwat Pandey

TL;DR
This paper introduces an interpretability framework for unsupervised reinforcement learning agents, analyzing how intrinsic motivation influences attention, behavior, and internal representations, with new metrics for attention dynamics.
Contribution
It proposes a novel interpretability framework and metrics to analyze internal dynamics of unsupervised RL agents, highlighting the effects of curiosity-driven exploration and architecture.
Findings
Curiosity-driven agents exhibit broader, more dynamic attention.
Transformer-RND combines wide attention with structured representations.
Architectural biases and training signals significantly influence agent behavior.
Abstract
We present an interpretability framework for unsupervised reinforcement learning (URL) agents, aimed at understanding how intrinsic motivation shapes attention, behavior, and representation learning. We analyze five agents DQN, RND, ICM, PPO, and a Transformer-RND variant trained on procedurally generated environments, using Grad-CAM, Layer-wise Relevance Propagation (LRP), exploration metrics, and latent space clustering. To capture how agents perceive and adapt over time, we introduce two metrics: attention diversity, which measures the spatial breadth of focus, and attention change rate, which quantifies temporal shifts in attention. Our findings show that curiosity-driven agents display broader, more dynamic attention and exploratory behavior than their extrinsically motivated counterparts. Among them, TransformerRND combines wide attention, high exploration coverage, and compact,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Embodied and Extended Cognition · Reinforcement Learning in Robotics
MethodsSoftmax · Attention Is All You Need · Convolution · Q-Learning · Dense Connections · Entropy Regularization · Proximal Policy Optimization · Deep Q-Network
