DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies
Soroush Nasiriany, Vitchyr H. Pong, Ashvin Nair, Alexander Khazatsky,, Glen Berseth, Sergey Levine

TL;DR
DisCo RL introduces goal distributions as a flexible task representation for reinforcement learning, enabling the learning of general-purpose policies that generalize across diverse tasks, especially in robot manipulation.
Contribution
The paper proposes goal distributions for task representation and develops DisCo RL, an off-policy algorithm that effectively learns general-purpose policies with broad task generalization.
Findings
DisCo RL outperforms prior methods on robot manipulation tasks.
It effectively generalizes to new goal distributions.
The approach balances expressivity and learnability.
Abstract
Can we use reinforcement learning to learn general-purpose policies that can perform a wide range of different tasks, resulting in flexible and reusable skills? Contextual policies provide this capability in principle, but the representation of the context determines the degree of generalization and expressivity. Categorical contexts preclude generalization to entirely new tasks. Goal-conditioned policies may enable some generalization, but cannot capture all tasks that might be desired. In this paper, we propose goal distributions as a general and broadly applicable task representation suitable for contextual policies. Goal distributions are general in the sense that they can represent any state-based reward function when equipped with an appropriate distribution class, while the particular choice of distribution class allows us to trade off expressivity and learnability. We develop an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
