Guided Uncertainty-Aware Policy Optimization: Combining Learning and   Model-Based Strategies for Sample-Efficient Policy Learning

Michelle A. Lee; Carlos Florensa; Jonathan Tremblay; Nathan Ratliff,; Animesh Garg; Fabio Ramos; Dieter Fox

arXiv:2005.10872·cs.RO·May 27, 2020

Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Michelle A. Lee, Carlos Florensa, Jonathan Tremblay, Nathan Ratliff,, Animesh Garg, Fabio Ramos, Dieter Fox

PDF

TL;DR

This paper introduces GUAPO, a hybrid approach combining model-based and learning-based policies using uncertainty estimates to improve sample efficiency and robustness in robotic policy learning, demonstrated on a real-world peg insertion task.

Contribution

The paper presents GUAPO, a novel method that integrates model-based and locally learned policies guided by uncertainty estimates for more efficient robotic learning.

Findings

01

Effective in real-world peg insertion task

02

Reduces reliance on accurate models and perception

03

Improves sample efficiency and robustness

Abstract

Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.