On The Presence of Double-Descent in Deep Reinforcement Learning

Viktor Vesel\'y; Aleksandar Todorov; Matthia Sabatelli

arXiv:2511.06895·cs.LG·November 11, 2025

On The Presence of Double-Descent in Deep Reinforcement Learning

Viktor Vesel\'y, Aleksandar Todorov, Matthia Sabatelli

PDF

Open Access

TL;DR

This paper provides initial evidence that double descent phenomena occur in deep reinforcement learning, with over-parameterization leading to more robust policies as indicated by entropy reduction.

Contribution

It is the first to systematically investigate double descent in DRL using an information-theoretic metric, revealing its impact on policy robustness and generalization.

Findings

01

Double descent observed in DRL training epochs.

02

Over-parameterization correlates with reduced policy entropy.

03

Implicit regularization guides policies to flatter minima.

Abstract

The double descent (DD) paradox, where over-parameterized models see generalization improve past the interpolation point, remains largely unexplored in the non-stationary domain of Deep Reinforcement Learning (DRL). We present preliminary evidence that DD exists in model-free DRL, investigating it systematically across varying model capacity using the Actor-Critic framework. We rely on an information-theoretic metric, Policy Entropy, to measure policy uncertainty throughout training. Preliminary results show a clear epoch-wise DD curve; the policy's entrance into the second descent region correlates with a sustained, significant reduction in Policy Entropy. This entropic decay suggests that over-parameterization acts as an implicit regularizer, guiding the policy towards robust, flatter minima in the loss landscape. These findings establish DD as a factor in DRL and provide an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning