An Entropy Regularization Free Mechanism for Policy-based Reinforcement   Learning

Changnan Xiao; Haosen Shi; Jiajun Fan; Shihong Deng

arXiv:2106.00707·cs.LG·June 3, 2021·1 cites

An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng

PDF

Open Access

TL;DR

This paper introduces a novel mechanism for policy-based reinforcement learning that avoids entropy regularization, achieving diversity, exploration, and adaptive trade-offs, leading to state-of-the-art results in Arcade Learning Environment.

Contribution

It proposes an entropy regularization free mechanism for policy-based RL that attains key exploration and diversity characteristics, previously only seen in value-based methods.

Findings

01

Mechanism is highly sample-efficient.

02

Achieves state-of-the-art performance on Arcade Learning Environment.

03

Outperforms existing policy-based methods without entropy regularization.

Abstract

Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with {\epsilon}-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management

MethodsEntropy Regularization