Entropy Augmented Reinforcement Learning

Jianfei Ma

arXiv:2208.09322·cs.LG·March 6, 2023

Entropy Augmented Reinforcement Learning

Jianfei Ma

PDF

Open Access

TL;DR

This paper introduces an entropy augmentation technique for reinforcement learning that enhances exploration and performance, especially when combined with on-policy algorithms involving a value critic.

Contribution

It proposes a novel entropy augmentation method that aligns with the soft policy improvement theorem and improves exploration in reinforcement learning.

Findings

01

Achieves higher rewards on MuJoCo benchmark tasks

02

Balances exploration and exploitation effectively through temperature control

03

Enhances exploration bonus in custom environments

Abstract

Deep reinforcement learning was instigated with the presence of trust region methods, being scalable and efficient. However, the pessimism of such algorithms, among which it forces to constrain in a trust region by all means, has been proven to suppress the exploration and harm the performance. Exploratory algorithm such as SAC, while utilizes the entropy to encourage exploration, implicitly optimizing another objective yet. We first observed this inconsistency, and therefore put forward an analogous augmentation technique, which combines well with the on-policy algorithms, when a value critic is involved. Surprisingly, the proposed method consistently satisfies the soft policy improvement theorem, while being more extensible. As the analysis advises, it is crucial to control the temperature coefficient to balance the exploration and exploitation. Empirical tests on MuJoCo benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Adversarial Robustness in Machine Learning

MethodsDilated Convolution · Global Average Pooling · Convolution · Average Pooling · 1x1 Convolution · Switchable Atrous Convolution · Entropy Regularization · Trust Region Policy Optimization · Proximal Policy Optimization