DisCo RL: Distribution-Conditioned Reinforcement Learning for   General-Purpose Policies

Soroush Nasiriany; Vitchyr H. Pong; Ashvin Nair; Alexander Khazatsky,; Glen Berseth; Sergey Levine

arXiv:2104.11707·cs.LG·April 26, 2021

DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies

Soroush Nasiriany, Vitchyr H. Pong, Ashvin Nair, Alexander Khazatsky,, Glen Berseth, Sergey Levine

PDF

TL;DR

DisCo RL introduces goal distributions as a flexible task representation for reinforcement learning, enabling the learning of general-purpose policies that generalize across diverse tasks, especially in robot manipulation.

Contribution

The paper proposes goal distributions for task representation and develops DisCo RL, an off-policy algorithm that effectively learns general-purpose policies with broad task generalization.

Findings

01

DisCo RL outperforms prior methods on robot manipulation tasks.

02

It effectively generalizes to new goal distributions.

03

The approach balances expressivity and learnability.

Abstract

Can we use reinforcement learning to learn general-purpose policies that can perform a wide range of different tasks, resulting in flexible and reusable skills? Contextual policies provide this capability in principle, but the representation of the context determines the degree of generalization and expressivity. Categorical contexts preclude generalization to entirely new tasks. Goal-conditioned policies may enable some generalization, but cannot capture all tasks that might be desired. In this paper, we propose goal distributions as a general and broadly applicable task representation suitable for contextual policies. Goal distributions are general in the sense that they can represent any state-based reward function when equipped with an appropriate distribution class, while the particular choice of distribution class allows us to trade off expressivity and learnability. We develop an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.