Stagewise Reinforcement Learning and the Geometry of the Regret Landscape

Chris Elliott; Einar Urdshals; David Quarel; Matthew Farrugia-Roberts; Daniel Murfet

arXiv:2601.07524·cs.LG·February 26, 2026

Stagewise Reinforcement Learning and the Geometry of the Regret Landscape

Chris Elliott, Einar Urdshals, David Quarel, Matthew Farrugia-Roberts, Daniel Murfet

PDF

Open Access

TL;DR

This paper extends singular learning theory to reinforcement learning, showing how the geometry of the regret landscape influences policy development and phase transitions during training.

Contribution

It introduces the local learning coefficient as a key geometric invariant governing policy concentration in RL, linking theory with empirical stagewise learning behavior.

Findings

01

Deep RL progresses from simple to complex policies with decreasing regret.

02

Phase transitions in training are characterized by opposing staircases of regret and LLC.

03

Empirical verification in a gridworld environment supports the theoretical predictions.

Abstract

Singular learning theory characterizes Bayesian learning as an evolving tradeoff between accuracy and complexity, with transitions between qualitatively different solutions as sample size increases. We extend this theory to reinforcement learning, proving that the concentration of a generalized posterior over policies is governed by the local learning coefficient (LLC), an invariant of the geometry of the regret function. This theory predicts that deep reinforcement learning with SGD should proceed from simple policies with high regret to complex policies with low regret. We verify this prediction empirically in a gridworld environment exhibiting stagewise policy development: phase transitions over training manifest as "opposing staircases" where regret decreases sharply while the LLC increases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Opinion Dynamics and Social Influence · Advanced Bandit Algorithms Research