Trust-Region-Free Policy Optimization for Stochastic Policies

Mingfei Sun; Benjamin Ellis; Anuj Mahajan; Sam Devlin; Katja Hofmann,; Shimon Whiteson

arXiv:2302.07985·cs.LG·February 17, 2023·1 cites

Trust-Region-Free Policy Optimization for Stochastic Policies

Mingfei Sun, Benjamin Ellis, Anuj Mahajan, Sam Devlin, Katja Hofmann,, Shimon Whiteson

PDF

Open Access

TL;DR

This paper introduces TREFree, a novel policy optimization method that removes the computationally intensive trust region constraint from TRPO, while maintaining monotonic improvement guarantees, leading to better performance and efficiency.

Contribution

The paper proposes a trust-region-free policy optimization algorithm that preserves monotonic improvement guarantees by generalizing the surrogate objective, simplifying implementation and enhancing efficiency.

Findings

01

TREFree outperforms TRPO and PPO in policy performance.

02

TREFree is more sample-efficient than TRPO and PPO.

03

The method maintains monotonic improvement without explicit trust regions.

Abstract

Trust Region Policy Optimization (TRPO) is an iterative method that simultaneously maximizes a surrogate objective and enforces a trust region constraint over consecutive policies in each iteration. The combination of the surrogate objective maximization and the trust region enforcement has been shown to be crucial to guarantee a monotonic policy improvement. However, solving a trust-region-constrained optimization problem can be computationally intensive as it requires many steps of conjugate gradient and a large number of on-policy samples. In this paper, we show that the trust region constraint over policies can be safely substituted by a trust-region-free constraint without compromising the underlying monotonic improvement guarantee. The key idea is to generalize the surrogate objective used in TRPO in a way that a monotonic improvement guarantee still emerges as a result of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Optimization and Search Problems

MethodsTrust Region Policy Optimization