A Connection between One-Step Regularization and Critic Regularization   in Reinforcement Learning

Benjamin Eysenbach; Matthieu Geist; Sergey Levine; Ruslan; Salakhutdinov

arXiv:2307.12968·cs.LG·July 25, 2023·1 cites

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan, Salakhutdinov

PDF

Open Access 1 Repo

TL;DR

This paper reveals a theoretical connection between one-step regularization and critic regularization in offline reinforcement learning, showing they can produce equivalent policies under certain conditions and analyzing their practical implications.

Contribution

It establishes a formal link between one-step and critic regularization methods, providing insights into their equivalence and practical performance in offline RL.

Findings

01

Applying critic regularization with coefficient 1 yields the same policy as one-step RL.

02

Practical implementations often violate assumptions but still align with theoretical predictions.

03

One-step RL can be competitive with critic regularization in highly regularized RL problems.

Abstract

As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while critic regularization methods do many steps of policy improvement with a regularized objective. These methods appear distinct. One-step methods, such as advantage-weighted regression and conditional behavioral cloning, truncate policy iteration after just one step. This ``early stopping'' makes one-step RL simple and stable, but can limit its asymptotic performance. Critic regularization typically requires more compute but has appealing lower-bound guarantees. In this paper, we draw a close connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ben-eysenbach/ac-connection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management