Generative Planning for Temporally Coordinated Exploration in   Reinforcement Learning

Haichao Zhang; Wei Xu; Haonan Yu

arXiv:2201.09765·cs.LG·February 7, 2022·1 cites

Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning

Haichao Zhang, Wei Xu, Haonan Yu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Generative Planning (GPM), a novel reinforcement learning approach that generates multi-step action plans for improved exploration, adaptability, and interpretability, outperforming baseline methods in benchmark environments.

Contribution

GPM is the first method to generate and refine multi-step plans for temporally coordinated exploration in reinforcement learning.

Findings

01

GPM outperforms baseline methods in benchmark environments.

02

Multi-step plans improve exploration efficiency.

03

Plans offer interpretable agent intent.

Abstract

Standard model-free reinforcement learning algorithms optimize a policy that generates the action to be taken in the current time step in order to maximize expected future return. While flexible, it faces difficulties arising from the inefficient exploration due to its single step nature. In this work, we present Generative Planning method (GPM), which can generate actions not only for the current step, but also for a number of future steps (thus termed as generative planning). This brings several benefits to GPM. Firstly, since GPM is trained by maximizing value, the plans generated from it can be regarded as intentional action sequences for reaching high value regions. GPM can therefore leverage its generated multi-step plans for temporally coordinated exploration towards high value regions, which is potentially more effective than a sequence of actions generated by perturbing each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Haichao-Zhang/generative-planning
pytorchOfficial

Videos

Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics