Plan Online, Learn Offline: Efficient Learning and Exploration via   Model-Based Control

Kendall Lowrey; Aravind Rajeswaran; Sham Kakade; Emanuel Todorov; Igor; Mordatch

arXiv:1811.01848·cs.LG·January 29, 2019·66 cites

Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control

Kendall Lowrey, Aravind Rajeswaran, Sham Kakade, Emanuel Todorov, Igor, Mordatch

PDF

Open Access

TL;DR

The paper introduces POLO, a framework combining local model-based control and global value learning, enabling efficient, stable, and exploratory learning for complex control tasks with minimal real-world experience.

Contribution

It presents a novel integration of trajectory optimization, value function learning, and exploration strategies to improve sample efficiency and stability in continuous control tasks.

Findings

01

Enables complex control tasks with minutes of real-world experience

02

Improves stability and speed of value function learning

03

Uses trajectory optimization for coordinated exploration

Abstract

We propose a plan online and learn offline (POLO) framework for the setting where an agent, with an internal model, needs to continually act and learn in the world. Our work builds on the synergistic relationship between local model-based control, global value function learning, and exploration. We study how local trajectory optimization can cope with approximation errors in the value function, and can stabilize and accelerate value function learning. Conversely, we also study how approximate value functions can help reduce the planning horizon and allow for better policies beyond local solutions. Finally, we also demonstrate how trajectory optimization can be used to perform temporally coordinated exploration in conjunction with estimating uncertainty in value function approximation. This exploration is critical for fast and stable learning of the value function. Combining these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Simulation Techniques and Applications