StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning

Ivo Nowak

arXiv:2604.08620·cs.LG·April 21, 2026

StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning

Ivo Nowak

PDF

TL;DR

This paper demonstrates that the learning dynamics in distributional reinforcement learning can reveal and utilize underlying dynamic programming structures to improve sampling and learning efficiency.

Contribution

It introduces a method to recover and exploit dynamic programming-like structure from distributional RL dynamics without explicit models.

Findings

01

The temporal evolution of return distributions indicates when and where learning occurs.

02

A temporal learning indicator t*(s) reflects the strongest update in each state.

03

Using these signals, StructRL guides sampling in line with the learned propagation structure.

Abstract

Reinforcement learning is typically treated as a uniform, data-driven optimization process, where updates are guided by rewards and temporal-difference errors without explicitly exploiting global structure. In contrast, dynamic programming methods rely on structured information propagation, enabling efficient and stable learning. In this paper, we provide evidence that such structure can be recovered from the learning dynamics of distributional reinforcement learning. By analyzing the temporal evolution of return distributions, we identify signals that capture when and where learning occurs in the state space. In particular, we introduce a temporal learning indicator t*(s) that reflects when a state undergoes its strongest learning update during training. Empirically, this signal induces an ordering over states that is consistent with a dynamic programming-style propagation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.