COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically   for Model-Based RL

Xiyao Wang; Ruijie Zheng; Yanchao Sun; Ruonan Jia; Wichayaporn; Wongkamjan; Huazhe Xu; Furong Huang

arXiv:2310.07220·cs.LG·January 2, 2024

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn, Wongkamjan, Huazhe Xu, Furong Huang

PDF

Open Access

TL;DR

COPlanner is a planning framework for model-based reinforcement learning that balances conservative rollouts and optimistic exploration to mitigate model errors and improve sample efficiency and performance.

Contribution

It introduces a novel uncertainty-aware policy-guided model predictive control component to dynamically balance exploration and exploitation in model-based RL.

Findings

01

Significantly improves sample efficiency in control tasks.

02

Enhances asymptotic performance of model-based methods.

03

Effectively reduces impact of model prediction errors.

Abstract

Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration using current policy for dynamics model learning. However, due to the complex real-world environment, it is inevitable to learn an imperfect dynamics model with model prediction error, which can further mislead policy learning and result in sub-optimal solutions. In this paper, we propose $COPlanner$ , a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration. $COPlanner$ leverages an uncertainty-aware policy-guided model predictive control (UP-MPC) component to plan for multi-step uncertainty estimation. This estimated uncertainty then serves as a penalty during model rollouts and as a bonus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics