On The Presence of Double-Descent in Deep Reinforcement Learning
Viktor Vesel\'y, Aleksandar Todorov, Matthia Sabatelli

TL;DR
This paper provides initial evidence that double descent phenomena occur in deep reinforcement learning, with over-parameterization leading to more robust policies as indicated by entropy reduction.
Contribution
It is the first to systematically investigate double descent in DRL using an information-theoretic metric, revealing its impact on policy robustness and generalization.
Findings
Double descent observed in DRL training epochs.
Over-parameterization correlates with reduced policy entropy.
Implicit regularization guides policies to flatter minima.
Abstract
The double descent (DD) paradox, where over-parameterized models see generalization improve past the interpolation point, remains largely unexplored in the non-stationary domain of Deep Reinforcement Learning (DRL). We present preliminary evidence that DD exists in model-free DRL, investigating it systematically across varying model capacity using the Actor-Critic framework. We rely on an information-theoretic metric, Policy Entropy, to measure policy uncertainty throughout training. Preliminary results show a clear epoch-wise DD curve; the policy's entrance into the second descent region correlates with a sustained, significant reduction in Policy Entropy. This entropic decay suggests that over-parameterization acts as an implicit regularizer, guiding the policy towards robust, flatter minima in the loss landscape. These findings establish DD as a factor in DRL and provide an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning
