Quasi-Newton Trust Region Policy Optimization

Devesh Jha; Arvind Raghunathan; Diego Romeres

arXiv:1912.11912·cs.LG·December 30, 2019·1 cites

Quasi-Newton Trust Region Policy Optimization

Devesh Jha, Arvind Raghunathan, Diego Romeres

PDF

Open Access

TL;DR

This paper introduces QNTRPO, a trust region policy optimization method using Quasi-Newton Hessian approximation, which enhances convergence and sample efficiency in continuous control reinforcement learning tasks.

Contribution

The paper presents a novel trust region policy optimization algorithm employing Quasi-Newton approximation, addressing stepsize selection and convergence issues in reinforcement learning.

Findings

01

Improves sample efficiency in continuous control tasks

02

Achieves faster convergence compared to existing methods

03

Demonstrates state-of-the-art performance in various benchmarks

Abstract

We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization QNTRPO. Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Reinforcement Learning in Robotics · Distributed Control Multi-Agent Systems