Planning with Exploration: Addressing Dynamics Bottleneck in Model-based   Reinforcement Learning

Xiyao Wang; Junge Zhang; Wenzhen Huang; Qiyue Yin

arXiv:2010.12914·cs.LG·June 25, 2021

Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement Learning

Xiyao Wang, Junge Zhang, Wenzhen Huang, Qiyue Yin

PDF

Open Access

TL;DR

This paper identifies the trajectory reward estimation error as the cause of the dynamics bottleneck in model-based reinforcement learning and proposes MOPE2, an exploration method that improves sample efficiency by reducing this error.

Contribution

It introduces a theoretical analysis linking exploration to reward estimation error and proposes MOPE2, a new exploration strategy that alleviates the dynamics bottleneck in MBRL.

Findings

01

MOPE2 effectively alleviates the dynamics bottleneck.

02

MOPE2 achieves higher sample efficiency on complex benchmarks.

03

Theoretical bounds connect exploration to reward estimation error.

Abstract

Model-based reinforcement learning (MBRL) is believed to have higher sample efficiency compared with model-free reinforcement learning (MFRL). However, MBRL is plagued by dynamics bottleneck dilemma. Dynamics bottleneck dilemma is the phenomenon that the performance of the algorithm falls into the local optimum instead of increasing when the interaction step with the environment increases, which means more data can not bring better performance. In this paper, we find that the trajectory reward estimation error is the main reason that causes dynamics bottleneck dilemma through theoretical analysis. We give an upper bound of the trajectory reward estimation error and point out that increasing the agent's exploration ability is the key to reduce trajectory reward estimation error, thereby alleviating dynamics bottleneck dilemma. Motivated by this, a model-based control method combined with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Viral Infectious Diseases and Gene Expression in Insects