Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning
Gang Chen, Yiming Peng

TL;DR
This paper extends soft policy iteration to include generalized entropy measures, introducing new algorithms TAC and RAC, and develops an ensemble approach EAC that enhances exploration and performance in deep reinforcement learning.
Contribution
The paper presents a new policy iteration theory supporting generalized entropy measures, leading to novel algorithms TAC, RAC, and an ensemble method EAC for improved exploration and learning.
Findings
TAC and RAC outperform SAC on benchmark tasks.
EAC achieves state-of-the-art results with better sample efficiency.
Generalized entropy measures enhance exploration and policy performance.
Abstract
We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
