Model-based Reinforcement Learning with Multi-step Plan Value Estimation

Haoxin Lin; Yihao Sun; Jiaji Zhang; Yang Yu

arXiv:2209.05530·cs.LG·September 14, 2022

Model-based Reinforcement Learning with Multi-step Plan Value Estimation

Haoxin Lin, Yihao Sun, Jiaji Zhang, Yang Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces MPPVE, a model-based reinforcement learning algorithm that uses multi-step plan value estimation to improve sample efficiency by better utilizing learned models despite their errors.

Contribution

It proposes a novel multi-step plan value estimation method and integrates it into a new algorithm, MPPVE, enhancing model-based RL performance.

Findings

01

MPPVE outperforms state-of-the-art model-based RL methods.

02

The approach improves sample efficiency in environments with model errors.

03

Multi-step plan evaluation enhances model utilization.

Abstract

A promising way to improve the sample efficiency of reinforcement learning is model-based methods, in which many explorations and evaluations can happen in the learned models to save real-world samples. However, when the learned model has a non-negligible model error, sequential steps in the model are hard to be accurately evaluated, limiting the model's utilization. This paper proposes to alleviate this issue by introducing multi-step plans to replace multi-step actions for model-based RL. We employ the multi-step plan value estimation, which evaluates the expected discounted return after executing a sequence of action plans at a given state, and updates the policy by directly computing the multi-step policy gradient via plan value estimation. The new model-based reinforcement learning algorithm MPPVE (Model-based Planning Policy Learning with Multi-step Plan Value Estimation) shows a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HxLyn3/MPPVE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques