Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning
Michelle A. Lee, Carlos Florensa, Jonathan Tremblay, Nathan Ratliff,, Animesh Garg, Fabio Ramos, Dieter Fox

TL;DR
This paper introduces GUAPO, a hybrid approach combining model-based and learning-based policies using uncertainty estimates to improve sample efficiency and robustness in robotic policy learning, demonstrated on a real-world peg insertion task.
Contribution
The paper presents GUAPO, a novel method that integrates model-based and locally learned policies guided by uncertainty estimates for more efficient robotic learning.
Findings
Effective in real-world peg insertion task
Reduces reliance on accurate models and perception
Improves sample efficiency and robustness
Abstract
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
