On Double Descent in Reinforcement Learning with LSTD and Random Features
David Brellmann, Elo\"ise Berthier, David Filliat, Goran Frehse

TL;DR
This paper provides a theoretical analysis of how network size and regularization affect the performance of TD algorithms in deep RL, revealing a double descent phenomenon related to the parameter-to-state ratio.
Contribution
It introduces a theoretical framework for understanding over-parameterization and double descent in RL with LSTD and random features, supported by asymptotic analysis.
Findings
Double descent phenomenon observed around parameter/state ratio of one.
Regularization and unvisited states influence the correction terms in MSBE.
Theoretical predictions closely match numerical experiments.
Abstract
Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and -regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Evolutionary Algorithms and Applications
