Model-based Reinforcement Learning with Multi-step Plan Value Estimation
Haoxin Lin, Yihao Sun, Jiaji Zhang, Yang Yu

TL;DR
This paper introduces MPPVE, a model-based reinforcement learning algorithm that uses multi-step plan value estimation to improve sample efficiency by better utilizing learned models despite their errors.
Contribution
It proposes a novel multi-step plan value estimation method and integrates it into a new algorithm, MPPVE, enhancing model-based RL performance.
Findings
MPPVE outperforms state-of-the-art model-based RL methods.
The approach improves sample efficiency in environments with model errors.
Multi-step plan evaluation enhances model utilization.
Abstract
A promising way to improve the sample efficiency of reinforcement learning is model-based methods, in which many explorations and evaluations can happen in the learned models to save real-world samples. However, when the learned model has a non-negligible model error, sequential steps in the model are hard to be accurately evaluated, limiting the model's utilization. This paper proposes to alleviate this issue by introducing multi-step plans to replace multi-step actions for model-based RL. We employ the multi-step plan value estimation, which evaluates the expected discounted return after executing a sequence of action plans at a given state, and updates the policy by directly computing the multi-step policy gradient via plan value estimation. The new model-based reinforcement learning algorithm MPPVE (Model-based Planning Policy Learning with Multi-step Plan Value Estimation) shows a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques
