Maximum Entropy RL (Provably) Solves Some Robust RL Problems
Benjamin Eysenbach, Sergey Levine

TL;DR
This paper provides a theoretical proof that maximum entropy reinforcement learning inherently maximizes a lower bound on a robust RL objective, demonstrating its robustness to certain disturbances without extra modifications.
Contribution
The work offers the first rigorous proof and theoretical characterization of MaxEnt RL's robustness to disturbances in dynamics and reward functions.
Findings
MaxEnt RL maximizes a lower bound on a robust RL objective.
MaxEnt RL is robust to certain disturbances without additional modifications.
Provides formal guarantees for MaxEnt RL's robustness.
Abstract
Many potential applications of reinforcement learning (RL) require guarantees that the agent will perform well in the face of disturbances to the dynamics or reward function. In this paper, we prove theoretically that maximum entropy (MaxEnt) RL maximizes a lower bound on a robust RL objective, and thus can be used to learn policies that are robust to some disturbances in the dynamics and the reward function. While this capability of MaxEnt RL has been observed empirically in prior work, to the best of our knowledge our work provides the first rigorous proof and theoretical characterization of the MaxEnt RL robust set. While a number of prior robust RL algorithms have been designed to handle similar disturbances to the reward function or dynamics, these methods typically require additional moving parts and hyperparameters on top of a base RL algorithm. In contrast, our results suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
