Do You Need the Entropy Reward (in Practice)?
Haonan Yu, Haichao Zhang, Wei Xu

TL;DR
This paper critically examines the role of entropy rewards in MaxEnt RL, revealing potential pitfalls in policy evaluation and proposing simplified approaches that improve practical performance and robustness.
Contribution
It provides empirical insights into when and how entropy rewards should be used in policy evaluation, and introduces simplified methods like SACZero and SACLite for better practical outcomes.
Findings
Entropy rewards can obscure main task rewards if not managed properly.
Using entropy regularization only in policy improvement can match or outperform full MaxEnt RL.
Normalizing or removing entropy rewards from evaluation improves robustness and performance.
Abstract
Maximum entropy (MaxEnt) RL maximizes a combination of the original task reward and an entropy reward. It is believed that the regularization imposed by entropy, on both policy improvement and policy evaluation, together contributes to good exploration, training convergence, and robustness of learned policies. This paper takes a closer look at entropy as an intrinsic reward, by conducting various ablation studies on soft actor-critic (SAC), a popular representative of MaxEnt RL. Our findings reveal that in general, entropy rewards should be applied with caution to policy evaluation. On one hand, the entropy reward, like any other intrinsic reward, could obscure the main task reward if it is not properly managed. We identify some failure cases of the entropy reward especially in episodic Markov decision processes (MDPs), where it could cause the policy to be overly optimistic or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural and Behavioral Psychology Studies · Explainable Artificial Intelligence (XAI)
MethodsEntropy Regularization
