An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning
Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng

TL;DR
This paper introduces a novel mechanism for policy-based reinforcement learning that avoids entropy regularization, achieving diversity, exploration, and adaptive trade-offs, leading to state-of-the-art results in Arcade Learning Environment.
Contribution
It proposes an entropy regularization free mechanism for policy-based RL that attains key exploration and diversity characteristics, previously only seen in value-based methods.
Findings
Mechanism is highly sample-efficient.
Achieves state-of-the-art performance on Arcade Learning Environment.
Outperforms existing policy-based methods without entropy regularization.
Abstract
Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with {\epsilon}-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management
MethodsEntropy Regularization
