Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Roger Creus Castanyer; Johan Obando-Ceron; Lu Li; Pierre-Luc Bacon; Glen Berseth; Aaron Courville; Pablo Samuel Castro

arXiv:2506.15544·cs.LG·February 3, 2026

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Roger Creus Castanyer, Johan Obando-Ceron, Lu Li, Pierre-Luc Bacon, Glen Berseth, Aaron Courville, Pablo Samuel Castro

PDF

Open Access

TL;DR

This paper investigates the causes of performance degradation in large-scale deep reinforcement learning, identifying non-stationarity and gradient issues as key factors, and proposes simple interventions to stabilize training and improve scalability.

Contribution

The paper introduces straightforward gradient stabilization techniques that address scale-related challenges in deep reinforcement learning, supported by empirical validation across various environments.

Findings

01

Gradient pathologies combined with non-stationarity hinder large-scale RL performance.

02

Simple interventions can stabilize gradient flow across different network sizes.

03

Proposed methods enable robust RL performance at scale.

Abstract

Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure mode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Evolutionary Algorithms and Applications