Regularized Policies are Reward Robust
Hisham Husain, Kamil Ciosek, Ryota Tomioka

TL;DR
This paper reveals that entropic regularization in reinforcement learning acts as a form of reward robustness, making policies optimal under worst-case adversarial rewards, thus providing a new perspective on exploration and regularization effects.
Contribution
The authors derive a dual formulation showing regularized RL as an adversarial reward problem, connecting regularization to robust policy optimization.
Findings
Regularized policies are equivalent to solving a worst-case adversarial reward problem.
Entropic regularization can be interpreted as a robustness mechanism against reward perturbations.
The framework applies to various regularization schemes beyond entropy.
Abstract
Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the more general regularized RL objective and using Fenchel duality; we derive the dual problem which takes the form of an adversarial reward problem. In particular, we find that the optimal policy found by a regularized objective is precisely an optimal policy of a reinforcement learning problem under a worst-case adversarial reward. Our result allows us to reinterpret the popular entropic regularization scheme as a form of robustification. Furthermore, due to the generality of our results, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Gene Regulatory Network Analysis
