Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning
Haichao Zhang, Wei Xu, Haonan Yu

TL;DR
This paper introduces Generative Planning (GPM), a novel reinforcement learning approach that generates multi-step action plans for improved exploration, adaptability, and interpretability, outperforming baseline methods in benchmark environments.
Contribution
GPM is the first method to generate and refine multi-step plans for temporally coordinated exploration in reinforcement learning.
Findings
GPM outperforms baseline methods in benchmark environments.
Multi-step plans improve exploration efficiency.
Plans offer interpretable agent intent.
Abstract
Standard model-free reinforcement learning algorithms optimize a policy that generates the action to be taken in the current time step in order to maximize expected future return. While flexible, it faces difficulties arising from the inefficient exploration due to its single step nature. In this work, we present Generative Planning method (GPM), which can generate actions not only for the current step, but also for a number of future steps (thus termed as generative planning). This brings several benefits to GPM. Firstly, since GPM is trained by maximizing value, the plans generated from it can be regarded as intentional action sequences for reaching high value regions. GPM can therefore leverage its generated multi-step plans for temporally coordinated exploration towards high value regions, which is potentially more effective than a sequence of actions generated by perturbing each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
