Predictor-Corrector Policy Optimization
Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots

TL;DR
PicCoLO is a novel framework that enhances first-order reinforcement learning algorithms by incorporating predictive models and correction steps, leading to faster policy convergence without suffering from model bias.
Contribution
The paper introduces PicCoLO, a predictor-corrector framework that systematically improves policy optimization algorithms using predictable online learning techniques.
Findings
PicCoLO accelerates convergence of first-order algorithms.
The framework corrects for model prediction errors effectively.
Theoretical and simulation results demonstrate improved performance.
Abstract
We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning. The new "PicCoLOed" algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error. Unlike previous algorithms, PicCoLO corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias. The development of PicCoLO is made possible by a novel reduction from predictable online learning to adversarial online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
