A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning
Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan, Salakhutdinov

TL;DR
This paper reveals a theoretical connection between one-step regularization and critic regularization in offline reinforcement learning, showing they can produce equivalent policies under certain conditions and analyzing their practical implications.
Contribution
It establishes a formal link between one-step and critic regularization methods, providing insights into their equivalence and practical performance in offline RL.
Findings
Applying critic regularization with coefficient 1 yields the same policy as one-step RL.
Practical implementations often violate assumptions but still align with theoretical predictions.
One-step RL can be competitive with critic regularization in highly regularized RL problems.
Abstract
As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while critic regularization methods do many steps of policy improvement with a regularized objective. These methods appear distinct. One-step methods, such as advantage-weighted regression and conditional behavioral cloning, truncate policy iteration after just one step. This ``early stopping'' makes one-step RL simple and stable, but can limit its asymptotic performance. Critic regularization typically requires more compute but has appealing lower-bound guarantees. In this paper, we draw a close connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management
