GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
Mianchu Wang, Rui Yang, Xi Chen, Hao Sun, Meng Fang, Giovanni Montana

TL;DR
GOPlan introduces a model-based offline goal-conditioned reinforcement learning framework that uses planning and learned models to generate high-quality imaginary trajectories, improving performance, data efficiency, and generalization to unseen goals.
Contribution
The paper presents GOPlan, a novel model-based approach with a prior policy and planning-based reanalysis for offline GCRL, addressing data limitations and OOD goal generalization.
Findings
Achieves state-of-the-art results on multi-goal tasks
Handles small data budgets effectively
Generalizes well to out-of-distribution goals
Abstract
Offline Goal-Conditioned RL (GCRL) offers a feasible paradigm for learning general-purpose policies from diverse and multi-task offline datasets. Despite notable recent progress, the predominant offline GCRL methods, mainly model-free, face constraints in handling limited data and generalizing to unseen goals. In this work, we propose Goal-conditioned Offline Planning (GOPlan), a novel model-based framework that contains two key phases: (1) pretraining a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset; (2) employing the reanalysis method with planning to generate imagined trajectories for funetuning policies. Specifically, we base the prior policy on an advantage-weighted conditioned generative adversarial network, which facilitates distinct mode separation, mitigating the pitfalls of out-of-distribution (OOD) actions. For further policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications
MethodsBalanced Selection
