Model Embedding Model-Based Reinforcement Learning

Xiaoyu Tan; Chao Qu; Junwu Xiong; James Zhang

arXiv:2006.09234·cs.LG·June 17, 2020

Model Embedding Model-Based Reinforcement Learning

Xiaoyu Tan, Chao Qu, Junwu Xiong, James Zhang

PDF

Open Access

TL;DR

This paper introduces MEMB, a model-embedding model-based reinforcement learning algorithm that balances data efficiency and model bias by integrating real and imaginary data, achieving state-of-the-art results.

Contribution

The paper proposes a novel MEMB algorithm that embeds the model into policy updates within a probabilistic framework, balancing data efficiency and bias.

Findings

01

MEMB achieves state-of-the-art performance on benchmarks.

02

Theoretical analysis supports MEMB's effectiveness under Lipschitz assumptions.

03

Embedding models in policy updates improves reinforcement learning efficiency.

Abstract

Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL). Despite the impressive results it achieves, it still faces a trade-off between the ease of data generation and model bias. In this paper, we propose a simple and elegant model-embedding model-based reinforcement learning (MEMB) algorithm in the framework of the probabilistic reinforcement learning. To balance the sample-efficiency and model bias, we exploit both real and imaginary data in the training. In particular, we embed the model in the policy update and learn $Q$ and $V$ functions from the real data set. We provide the theoretical analysis of MEMB with the Lipschitz continuity assumption on the model and policy. At last, we evaluate MEMB on several benchmarks and demonstrate our algorithm can achieve state-of-the-art performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Fault Detection and Control Systems