Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep   Reinforcement Learning

Jinxin Liu; Donglin Wang; Qiangxing Tian; Zhengyu Chen

arXiv:2104.05043·cs.LG·December 14, 2021·1 cites

Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

Jinxin Liu, Donglin Wang, Qiangxing Tian, Zhengyu Chen

PDF

Open Access 1 Video

TL;DR

This paper introduces GPIM, a novel unsupervised approach enabling agents to learn goal-conditioned policies using intrinsic motivation, effectively discovering diverse goals and outperforming prior methods in robotic tasks.

Contribution

The paper proposes a new unsupervised learning framework, GPIM, that jointly learns abstract and goal-conditioned policies using intrinsic motivation without hand-crafted rewards.

Findings

01

GPIM outperforms prior techniques in robotic tasks.

02

The method effectively discovers diverse perceptually-specific goals.

03

It demonstrates efficiency and effectiveness in goal-conditioned policy learning.

Abstract

It is of significance for an agent to learn a widely applicable and general-purpose policy that can achieve diverse goals including images and text descriptions. Considering such perceptually-specific goals, the frontier of deep reinforcement learning research is to learn a goal-conditioned policy without hand-crafted rewards. To learn this kind of policy, recent works usually take as the reward the non-parametric distance to a given goal in an explicit embedding space. From a different viewpoint, we propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM), which jointly learns both an abstract-level policy and a goal-conditioned policy. The abstract-level policy is conditioned on a latent variable to optimize a discriminator and discovers diverse states that are further rendered into perceptually-specific goals for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics