EnTRPO: Trust Region Policy Optimization Method with Entropy Regularization
Sahar Roostaie, Mohammad Mehdi Ebadzadeh

TL;DR
EnTRPO enhances Trust Region Policy Optimization by integrating entropy regularization and replay buffers, leading to improved control performance in reinforcement learning tasks like Cart-Pole.
Contribution
This work introduces EnTRPO, a novel variant of TRPO that incorporates entropy regularization and off-policy replay buffers to improve policy learning.
Findings
EnTRPO outperforms TRPO in Cart-Pole control tasks.
Entropy regularization improves exploration and policy robustness.
Replay buffers help incorporate off-policy data into TRPO.
Abstract
Trust Region Policy Optimization (TRPO) is a popular and empirically successful policy search algorithm in reinforcement learning (RL). It iteratively solved the surrogate problem which restricts consecutive policies to be close to each other. TRPO is an on-policy algorithm. On-policy methods bring many benefits, like the ability to gauge each resulting policy. However, they typically discard all the knowledge about the policies which existed before. In this work, we use a replay buffer to borrow from the off-policy learning setting to TRPO. Entropy regularization is usually used to improve policy optimization in reinforcement learning. It is thought to aid exploration and generalization by encouraging more random policy choices. We add an Entropy regularization term to advantage over {\pi}, accumulated over time steps, in TRPO. We call this update EnTRPO. Our experiments demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Fuel Cells and Related Materials
MethodsTrust Region Policy Optimization · Entropy Regularization
