When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic

Alberto Fern\'andez-Hern\'andez; Cristian P\'erez-Corral; Jose I. Mestre; Manuel F. Dolz; Jose Duato; Enrique S. Quintana-Ort\'i

arXiv:2603.09950·cs.LG·March 11, 2026

When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic

Alberto Fern\'andez-Hern\'andez, Cristian P\'erez-Corral, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ort\'i

PDF

Open Access

TL;DR

This paper investigates how learning rates affect PPO actor-critic training stability by analyzing neuron activation patterns using a new metric, OUI, which predicts training success early on.

Contribution

It introduces the Overfitting-Underfitting Indicator (OUI), a novel metric for early detection of training regimes in PPO, linking neuron behavior to learning rate effects.

Findings

01

OUI measured at 10% of training predicts LR regimes.

02

Critic networks perform best in intermediate OUI range.

03

OUI-based screening outperforms other early stopping criteria.

Abstract

Deep Reinforcement Learning systems are highly sensitive to the learning rate (LR), and selecting stable and performant training runs often requires extensive hyperparameter search. In Proximal Policy Optimization (PPO) actor--critic methods, small LR values lead to slow convergence, whereas large LR values may induce instability or collapse. We analyse this phenomenon from the behavior of the hidden neurons in the network using the Overfitting-Underfitting Indicator (OUI), a metric that quantifies the balance of binary activation patterns over a fixed probe batch. We introduce an efficient batch-based formulation of OUI and derive a theoretical connection between LR and activation sign changes, clarifying how a correct evolution of the neuron's inner structure depends on the step size. Empirically, across three discrete-control environments and multiple seeds, we show that OUI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Advanced Memory and Neural Computing