Bridging the Gap Between Target Networks and Functional Regularization

Alexandre Pich\'e; Valentin Thomas; Rafael Pardinas; Joseph Marino,; Gian Maria Marconi; Christopher Pal; Mohammad Emtiyaz Khan

arXiv:2106.02613·stat.ML·September 8, 2023

Bridging the Gap Between Target Networks and Functional Regularization

Alexandre Pich\'e, Valentin Thomas, Rafael Pardinas, Joseph Marino,, Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the role of Target Networks in deep Reinforcement Learning, revealing their regularization effect, and proposes an explicit Functional Regularization method that improves stability and performance.

Contribution

It introduces a novel explicit Functional Regularization approach as a flexible alternative to Target Networks, with theoretical convergence analysis and empirical validation.

Findings

01

Functional Regularization can replace Target Networks effectively.

02

Adjusting regularization weight and update period improves performance.

03

The new method enhances accurate Q-value recovery.

Abstract

Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conduct an experimental study across a range of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexpiche/fr-tmlr
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Model Reduction and Neural Networks