A unified view of entropy-regularized Markov decision processes
Gergely Neu, Anders Jonsson, Vicen\c{c} G\'omez

TL;DR
This paper introduces a unified framework for entropy-regularized reinforcement learning in MDPs, connecting various algorithms through convex regularization and analyzing their convergence properties.
Contribution
It extends policy optimization to convex regularizations, formalizes algorithms as mirror descent variants, and analyzes convergence and empirical effects of regularization.
Findings
Exact TRPO converges to the optimal policy.
Entropy-regularized policy gradient methods may not converge.
Regularization impacts learning performance in simple RL setups.
Abstract
We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
MethodsTrust Region Policy Optimization
