Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Jihwan Jeong; Xiaoyu Wang; Jingmin Wang; Scott Sanner; Pascal Poupart

arXiv:2506.06261·cs.AI·June 9, 2025

Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner, Pascal Poupart

PDF

Open Access

TL;DR

RefPlan introduces a doubly Bayesian approach to offline model-based reinforcement learning, enhancing policy robustness and adaptability by dynamically updating environment uncertainty during deployment.

Contribution

It proposes a novel Reflect-then-Plan method that unifies uncertainty modeling with planning through Bayesian posterior estimation, improving offline RL performance.

Findings

01

Significantly improves conservative offline RL policies.

02

Maintains robustness under high epistemic uncertainty.

03

Demonstrates resilience to environment changes.

Abstract

Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · AI-based Problem Solving and Planning