Model-Free Trajectory-based Policy Optimization with Monotonic   Improvement

Riad Akrour; Abbas Abdolmaleki; Hany Abdulsamad; Jan Peters and; Gerhard Neumann

arXiv:1606.09197·cs.LG·July 3, 2018·19 cites

Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

Riad Akrour, Abbas Abdolmaleki, Hany Abdulsamad, Jan Peters and, Gerhard Neumann

PDF

Open Access

TL;DR

This paper introduces a model-free, trajectory-based policy optimization algorithm that guarantees monotonic improvement by directly learning a local quadratic Q-function, avoiding system dynamics approximation biases.

Contribution

It proposes a novel model-free policy optimization method that ensures monotonic improvement without relying on system dynamics linearization.

Findings

01

Demonstrates superior performance on nonlinear control tasks.

02

Ensures exact KL-constraint satisfaction in policy updates.

03

Provides theoretical guarantees of monotonic improvement.

Abstract

Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control of physical systems. However, the linear approximation of the system dynamics can introduce a bias in the policy update and prevent convergence to the optimal policy. In this article, we propose a new model-free trajectory-based policy optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates a local, quadratic and time-dependent \qfunc~learned from trajectory data instead of a model of the system dynamics. Our policy update ensures exact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Model Reduction and Neural Networks · Machine Learning and Algorithms