Contrastive Value Learning: Implicit Models for Simple Offline RL

Bogdan Mazoure; Benjamin Eysenbach; Ofir Nachum; Jonathan Tompson

arXiv:2211.02100·cs.LG·November 7, 2022

Contrastive Value Learning: Implicit Models for Simple Offline RL

Bogdan Mazoure, Benjamin Eysenbach, Ofir Nachum, Jonathan Tompson

PDF

Open Access

TL;DR

Contrastive Value Learning (CVL) introduces an implicit multi-step environment model for offline RL that directly estimates action values without reward functions or TD learning, improving performance on complex tasks.

Contribution

CVL proposes a novel implicit multi-step environment model that directly provides action values, bypassing traditional dynamics prediction and TD learning in offline RL.

Findings

01

CVL outperforms prior offline RL methods on continuous control benchmarks.

02

The implicit model scales to high-dimensional tasks.

03

CVL does not require reward functions or TD learning.

Abstract

Model-based reinforcement learning (RL) methods are appealing in the offline setting because they allow an agent to reason about the consequences of actions without interacting with the environment. Prior methods learn a 1-step dynamics model, which predicts the next state given the current state and action. These models do not immediately tell the agent which actions to take, but must be integrated into a larger RL framework. Can we model the environment dynamics in a different way, such that the learned model does directly indicate the value of each action? In this paper, we propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics. This model can be learned without access to reward functions, but nonetheless can be used to directly estimate the value of each action, without requiring any TD learning. Because this model represents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques