Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning
Donghoon Lee

TL;DR
This paper introduces a novel entropy-regularized reinforcement learning framework that interpolates between policy gradient and Q-learning, demonstrating potential performance improvements through a continuous parameterization.
Contribution
It reformulates entropy augmentation with KL-divergence, creating a unified algorithm that smoothly transitions from policy gradient to Q-learning.
Findings
Performance gains observed with intermediate algorithms.
Monotonic policy improvement through entropy regularization.
Unified approach bridging policy gradient and Q-learning.
Abstract
Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of KL-divergence to regularize optimization process. It results in a policy which monotonically improves while interpolating from the current policy to the softmax greedy policy. This policy is used to build a continuously parameterized algorithm which optimize policy and Q-function simultaneously and whose extreme limits correspond to policy gradient and Q-learning, respectively. Experiments show that there can be a performance gain using an intermediate algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management
MethodsSoftmax
