GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with   Learned Models

Mianchu Wang; Rui Yang; Xi Chen; Hao Sun; Meng Fang; Giovanni Montana

arXiv:2310.20025·cs.LG·May 17, 2024·2 cites

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Mianchu Wang, Rui Yang, Xi Chen, Hao Sun, Meng Fang, Giovanni Montana

PDF

Open Access

TL;DR

GOPlan introduces a model-based offline goal-conditioned reinforcement learning framework that uses planning and learned models to generate high-quality imaginary trajectories, improving performance, data efficiency, and generalization to unseen goals.

Contribution

The paper presents GOPlan, a novel model-based approach with a prior policy and planning-based reanalysis for offline GCRL, addressing data limitations and OOD goal generalization.

Findings

01

Achieves state-of-the-art results on multi-goal tasks

02

Handles small data budgets effectively

03

Generalizes well to out-of-distribution goals

Abstract

Offline Goal-Conditioned RL (GCRL) offers a feasible paradigm for learning general-purpose policies from diverse and multi-task offline datasets. Despite notable recent progress, the predominant offline GCRL methods, mainly model-free, face constraints in handling limited data and generalizing to unseen goals. In this work, we propose Goal-conditioned Offline Planning (GOPlan), a novel model-based framework that contains two key phases: (1) pretraining a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset; (2) employing the reanalysis method with planning to generate imagined trajectories for funetuning policies. Specifically, we base the prior policy on an advantage-weighted conditioned generative adversarial network, which facilitates distinct mode separation, mitigating the pitfalls of out-of-distribution (OOD) actions. For further policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications

MethodsBalanced Selection