Algorithmic Framework for Model-based Deep Reinforcement Learning with   Theoretical Guarantees

Yuping Luo; Huazhe Xu; Yuanzhi Li; Yuandong Tian; Trevor Darrell,; Tengyu Ma

arXiv:1807.03858·cs.LG·February 16, 2021·101 cites

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell,, Tengyu Ma

PDF

Open Access 2 Repos

TL;DR

This paper introduces a new theoretical framework for model-based deep reinforcement learning that guarantees monotone improvement and extends optimism principles without explicit uncertainty quantification.

Contribution

It proposes a novel meta-algorithm with theoretical guarantees and instantiates it as SLBO, achieving state-of-the-art results with limited samples.

Findings

01

SLBO achieves state-of-the-art performance on continuous control tasks.

02

The framework extends optimism-in-face-of-uncertainty to non-linear models.

03

The meta-algorithm guarantees monotone improvement in expected reward.

Abstract

Model-based reinforcement learning (RL) is considered to be a promising approach to reduce the sample complexity that hinders model-free RL. However, the theoretical understanding of such methods has been rather limited. This paper introduces a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees. We design a meta-algorithm with a theoretical guarantee of monotone improvement to a local maximum of the expected reward. The meta-algorithm iteratively builds a lower bound of the expected reward based on the estimated dynamical model and sample trajectories, and then maximizes the lower bound jointly over the policy and the model. The framework extends the optimism-in-face-of-uncertainty principle to non-linear dynamical models in a way that requires \textit{no explicit} uncertainty quantification. Instantiating our framework with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Evolutionary Algorithms and Applications