Predictor-Corrector Policy Optimization

Ching-An Cheng; Xinyan Yan; Nathan Ratliff; Byron Boots

arXiv:1810.06509·cs.LG·May 28, 2019·1 cites

Predictor-Corrector Policy Optimization

Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots

PDF

Open Access 1 Repo

TL;DR

PicCoLO is a novel framework that enhances first-order reinforcement learning algorithms by incorporating predictive models and correction steps, leading to faster policy convergence without suffering from model bias.

Contribution

The paper introduces PicCoLO, a predictor-corrector framework that systematically improves policy optimization algorithms using predictable online learning techniques.

Findings

01

PicCoLO accelerates convergence of first-order algorithms.

02

The framework corrects for model prediction errors effectively.

03

Theoretical and simulation results demonstrate improved performance.

Abstract

We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning. The new "PicCoLOed" algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error. Unlike previous algorithms, PicCoLO corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias. The development of PicCoLO is made possible by a novel reduction from predictable online learning to adversarial online…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gtrll/rlfamily
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms