EnTRPO: Trust Region Policy Optimization Method with Entropy   Regularization

Sahar Roostaie; Mohammad Mehdi Ebadzadeh

arXiv:2110.13373·cs.LG·October 27, 2021

EnTRPO: Trust Region Policy Optimization Method with Entropy Regularization

Sahar Roostaie, Mohammad Mehdi Ebadzadeh

PDF

Open Access

TL;DR

EnTRPO enhances Trust Region Policy Optimization by integrating entropy regularization and replay buffers, leading to improved control performance in reinforcement learning tasks like Cart-Pole.

Contribution

This work introduces EnTRPO, a novel variant of TRPO that incorporates entropy regularization and off-policy replay buffers to improve policy learning.

Findings

01

EnTRPO outperforms TRPO in Cart-Pole control tasks.

02

Entropy regularization improves exploration and policy robustness.

03

Replay buffers help incorporate off-policy data into TRPO.

Abstract

Trust Region Policy Optimization (TRPO) is a popular and empirically successful policy search algorithm in reinforcement learning (RL). It iteratively solved the surrogate problem which restricts consecutive policies to be close to each other. TRPO is an on-policy algorithm. On-policy methods bring many benefits, like the ability to gauge each resulting policy. However, they typically discard all the knowledge about the policies which existed before. In this work, we use a replay buffer to borrow from the off-policy learning setting to TRPO. Entropy regularization is usually used to improve policy optimization in reinforcement learning. It is thought to aid exploration and generalization by encouraging more random policy choices. We add an Entropy regularization term to advantage over {\pi}, accumulated over time steps, in TRPO. We call this update EnTRPO. Our experiments demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Fuel Cells and Related Materials

MethodsTrust Region Policy Optimization · Entropy Regularization