PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals
Henry Charlesworth, Giovanni Montana

TL;DR
PlanGAN introduces a model-based planning approach using generative adversarial networks to efficiently solve multi-goal reinforcement learning tasks with sparse rewards, outperforming traditional model-free methods in sample efficiency.
Contribution
This work presents a novel model-based algorithm, PlanGAN, that leverages GANs for trajectory generation and planning in multi-goal sparse reward environments, improving sample efficiency.
Findings
Achieves comparable performance to model-free baselines
Increases sample efficiency by 4-8 times
Effective in robotic navigation and manipulation tasks
Abstract
Learning with sparse rewards remains a significant challenge in reinforcement learning (RL), especially when the aim is to train a policy capable of achieving multiple different goals. To date, the most successful approaches for dealing with multi-goal, sparse reward environments have been model-free RL algorithms. In this work we propose PlanGAN, a model-based algorithm specifically designed for solving multi-goal tasks in environments with sparse rewards. Our method builds on the fact that any trajectory of experience collected by an agent contains useful information about how to achieve the goals observed during that trajectory. We use this to train an ensemble of conditional generative models (GANs) to generate plausible trajectories that lead the agent from its current state towards a specified goal. We then combine these imagined trajectories into a novel planning algorithm in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications
MethodsExperience Replay
