When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic
Alberto Fern\'andez-Hern\'andez, Cristian P\'erez-Corral, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ort\'i

TL;DR
This paper investigates how learning rates affect PPO actor-critic training stability by analyzing neuron activation patterns using a new metric, OUI, which predicts training success early on.
Contribution
It introduces the Overfitting-Underfitting Indicator (OUI), a novel metric for early detection of training regimes in PPO, linking neuron behavior to learning rate effects.
Findings
OUI measured at 10% of training predicts LR regimes.
Critic networks perform best in intermediate OUI range.
OUI-based screening outperforms other early stopping criteria.
Abstract
Deep Reinforcement Learning systems are highly sensitive to the learning rate (LR), and selecting stable and performant training runs often requires extensive hyperparameter search. In Proximal Policy Optimization (PPO) actor--critic methods, small LR values lead to slow convergence, whereas large LR values may induce instability or collapse. We analyse this phenomenon from the behavior of the hidden neurons in the network using the Overfitting-Underfitting Indicator (OUI), a metric that quantifies the balance of binary activation patterns over a fixed probe batch. We introduce an efficient batch-based formulation of OUI and derive a theoretical connection between LR and activation sign changes, clarifying how a correct evolution of the neuron's inner structure depends on the step size. Empirically, across three discrete-control environments and multiple seeds, we show that OUI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Advanced Memory and Neural Computing
