Investigating the Edge of Stability Phenomenon in Reinforcement Learning
Rares Iordan, Marc Peter Deisenroth, Mihaela Rosca

TL;DR
This paper investigates the edge of stability phenomenon in reinforcement learning, revealing its presence in off-policy deep RL and highlighting how different loss functions influence this dynamic, contrasting with supervised learning behaviors.
Contribution
It extends the understanding of the edge of stability phenomenon from supervised learning to reinforcement learning, showing its occurrence in off-policy algorithms and the impact of loss function choices.
Findings
Edge of stability observed in off-policy deep RL.
DQN with Huber loss shows strong edge of stability effects.
C51 with cross entropy loss does not exhibit the phenomenon.
Abstract
Recent progress has been made in understanding optimisation dynamics in neural networks trained with full-batch gradient descent with momentum with the uncovering of the edge of stability phenomenon in supervised learning. The edge of stability phenomenon occurs as the leading eigenvalue of the Hessian reaches the divergence threshold of the underlying optimisation algorithm for a quadratic loss, after which it starts oscillating around the threshold, and the loss starts to exhibit local instability but decreases over long time frames. In this work, we explore the edge of stability phenomenon in reinforcement learning (RL), specifically off-policy Q-learning algorithms across a variety of data regimes, from offline to online RL. Our experiments reveal that, despite significant differences to supervised learning, such as non-stationarity of the data distribution and the use of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Machine Learning and ELM
MethodsConvolution · Dense Connections · Deep Q-Network · Q-Learning · Huber loss
