Contrastive Value Learning: Implicit Models for Simple Offline RL
Bogdan Mazoure, Benjamin Eysenbach, Ofir Nachum, Jonathan Tompson

TL;DR
Contrastive Value Learning (CVL) introduces an implicit multi-step environment model for offline RL that directly estimates action values without reward functions or TD learning, improving performance on complex tasks.
Contribution
CVL proposes a novel implicit multi-step environment model that directly provides action values, bypassing traditional dynamics prediction and TD learning in offline RL.
Findings
CVL outperforms prior offline RL methods on continuous control benchmarks.
The implicit model scales to high-dimensional tasks.
CVL does not require reward functions or TD learning.
Abstract
Model-based reinforcement learning (RL) methods are appealing in the offline setting because they allow an agent to reason about the consequences of actions without interacting with the environment. Prior methods learn a 1-step dynamics model, which predicts the next state given the current state and action. These models do not immediately tell the agent which actions to take, but must be integrated into a larger RL framework. Can we model the environment dynamics in a different way, such that the learned model does directly indicate the value of each action? In this paper, we propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics. This model can be learned without access to reward functions, but nonetheless can be used to directly estimate the value of each action, without requiring any TD learning. Because this model represents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques
