Stagewise Reinforcement Learning and the Geometry of the Regret Landscape
Chris Elliott, Einar Urdshals, David Quarel, Matthew Farrugia-Roberts, Daniel Murfet

TL;DR
This paper extends singular learning theory to reinforcement learning, showing how the geometry of the regret landscape influences policy development and phase transitions during training.
Contribution
It introduces the local learning coefficient as a key geometric invariant governing policy concentration in RL, linking theory with empirical stagewise learning behavior.
Findings
Deep RL progresses from simple to complex policies with decreasing regret.
Phase transitions in training are characterized by opposing staircases of regret and LLC.
Empirical verification in a gridworld environment supports the theoretical predictions.
Abstract
Singular learning theory characterizes Bayesian learning as an evolving tradeoff between accuracy and complexity, with transitions between qualitatively different solutions as sample size increases. We extend this theory to reinforcement learning, proving that the concentration of a generalized posterior over policies is governed by the local learning coefficient (LLC), an invariant of the geometry of the regret function. This theory predicts that deep reinforcement learning with SGD should proceed from simple policies with high regret to complex policies with low regret. We verify this prediction empirically in a gridworld environment exhibiting stagewise policy development: phase transitions over training manifest as "opposing staircases" where regret decreases sharply while the LLC increases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Opinion Dynamics and Social Influence · Advanced Bandit Algorithms Research
