Regularized Policies are Reward Robust

Hisham Husain; Kamil Ciosek; Ryota Tomioka

arXiv:2101.07012·cs.LG·January 19, 2021·1 cites

Regularized Policies are Reward Robust

Hisham Husain, Kamil Ciosek, Ryota Tomioka

PDF

Open Access

TL;DR

This paper reveals that entropic regularization in reinforcement learning acts as a form of reward robustness, making policies optimal under worst-case adversarial rewards, thus providing a new perspective on exploration and regularization effects.

Contribution

The authors derive a dual formulation showing regularized RL as an adversarial reward problem, connecting regularization to robust policy optimization.

Findings

01

Regularized policies are equivalent to solving a worst-case adversarial reward problem.

02

Entropic regularization can be interpreted as a robustness mechanism against reward perturbations.

03

The framework applies to various regularization schemes beyond entropy.

Abstract

Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the more general regularized RL objective and using Fenchel duality; we derive the dual problem which takes the form of an adversarial reward problem. In particular, we find that the optimal policy found by a regularized objective is precisely an optimal policy of a reinforcement learning problem under a worst-case adversarial reward. Our result allows us to reinterpret the popular entropic regularization scheme as a form of robustification. Furthermore, due to the generality of our results, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Gene Regulatory Network Analysis