PLATO: Policy Learning using Adaptive Trajectory Optimization
Gregory Kahn, Tianhao Zhang, Sergey Levine, Pieter Abbeel

TL;DR
PLATO is a novel algorithm that trains complex control policies safely and efficiently by using adaptive model-predictive control to generate supervision, enabling better performance and fewer failures in autonomous systems.
Contribution
PLATO introduces an adaptive MPC-based supervision method for training high-dimensional policies without unsafe exploration, improving training speed and safety.
Findings
Faster policy learning compared to prior methods.
Significantly fewer catastrophic failures during training.
Achieves better policy performance in aerial vehicle tasks.
Abstract
Policy search can in principle acquire complex strategies for control of robots and other autonomous systems. When the policy is trained to process raw sensory inputs, such as images and depth maps, it can also acquire a strategy that combines perception and control. However, effectively processing such complex inputs requires an expressive policy class, such as a large neural network. These high-dimensional policies are difficult to train, especially when learning to control safety-critical systems. We propose PLATO, an algorithm that trains complex control policies with supervised learning, using model-predictive control (MPC) to generate the supervision, hence never in need of running a partially trained and potentially unsafe policy. PLATO uses an adaptive training method to modify the behavior of MPC to gradually match the learned policy in order to generate training samples at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
