Entropy-Augmented Entropy-Regularized Reinforcement Learning and a   Continuous Path from Policy Gradient to Q-Learning

Donghoon Lee

arXiv:2005.08844·cs.LG·June 8, 2020·1 cites

Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

Donghoon Lee

PDF

Open Access

TL;DR

This paper introduces a novel entropy-regularized reinforcement learning framework that interpolates between policy gradient and Q-learning, demonstrating potential performance improvements through a continuous parameterization.

Contribution

It reformulates entropy augmentation with KL-divergence, creating a unified algorithm that smoothly transitions from policy gradient to Q-learning.

Findings

01

Performance gains observed with intermediate algorithms.

02

Monotonic policy improvement through entropy regularization.

03

Unified approach bridging policy gradient and Q-learning.

Abstract

Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of KL-divergence to regularize optimization process. It results in a policy which monotonically improves while interpolating from the current policy to the softmax greedy policy. This policy is used to build a continuously parameterized algorithm which optimize policy and Q-function simultaneously and whose extreme limits correspond to policy gradient and Q-learning, respectively. Experiments show that there can be a performance gain using an intermediate algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management

MethodsSoftmax