Contrastive Difference Predictive Coding

Chongyi Zheng; Ruslan Salakhutdinov; Benjamin Eysenbach

arXiv:2310.20141·cs.LG·October 10, 2025·1 cites

Contrastive Difference Predictive Coding

Chongyi Zheng, Ruslan Salakhutdinov, Benjamin Eysenbach

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces a temporal difference contrastive predictive coding method that improves data efficiency in learning long-term dependencies for time-series prediction and goal-conditioned reinforcement learning.

Contribution

It presents a novel temporal difference approach to contrastive predictive coding, reducing data requirements for learning long-term dependencies in time series.

Findings

01

Achieves 2x median success rate improvement in goal-conditioned RL

02

Outperforms prior methods in stochastic environments

03

Significantly more sample efficient in tabular settings

Abstract

Predicting and reasoning about the future lie at the heart of many time-series questions. For example, goal-conditioned reinforcement learning can be viewed as learning representations to predict which states are likely to be visited in the future. While prior methods have used contrastive predictive coding to model time series data, learning representations that encode long-term dependencies usually requires large amounts of data. In this paper, we introduce a temporal difference version of contrastive predictive coding that stitches together pieces of different time series data to decrease the amount of data required to learn predictions of future events. We apply this representation learning method to derive an off-policy algorithm for goal-conditioned RL. Experiments demonstrate that, compared with prior RL methods, ours achieves $2 \times$ median improvement in success rates and…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

- The paper proposes a new temporal difference (TD) estimator for the InfoNCE loss, which is shown to be more efficient than the standard (Monte Carlo) estimator. - The proposed goal-conditioned reinforcement learning (RL) algorithm outperforms prior methods in both online and offline settings. - The proposed algorithm is capable of handling stochasticity in the environment dynamics. - In stochastic tasks, there is an excellent improvement in performance versus the baseline of Quasimetric RL,

Weaknesses

- The paper focuses on fairly trivial environments, it would be nice to see these methods working on more challenging higher dimensional goal conditioned RL tasks, as its not a given that these gains will carry over to tasks that matter a lot more. - The proposed TD estimator is more complex than the standard (Monte Carlo) estimator and its implementation requires more hyperparameters. - The performance of the proposed goal-conditioned RL algorithm on the most challenging tasks was less than 5

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The derived method fits nicely within the literature and seems to fill a nice gap between contrastive objectives from self-supervised objectives and more online focused temporal-difference updates.

Weaknesses

After the rebuttal, I'm raising my score. While I believe there are issues with empirical section still, these are issues the rest of the literature are also facing. I don't think rejecting this paper is a way to a solution. I also appreciate the fix to some of the inaccurate statements that were overlooked! Great job authors! --------before edit------- This paper struggles with clarity and accuracy in some of the ancillary statements made about the literature surrounding the paper and in th

Reviewer 03Rating 8· accept, good paperConfidence 3

Strengths

The paper is mostly well written, apart from some details (see questions section). The derivations are sound. Experimental results show strong performance comparing to previous methods. The paper also presents some analysis and insights to explain the performance.

Weaknesses

The novelty is slightly limited. The idea of using InfoNCE to estimate the state occupancy measure has been presented in contrastive RL; the Bellman-like update and the use of importance weight has been presented in C-Learning.

Code & Models

Repositories

chongyi-zheng/td_infonce
jaxOfficial

Videos

Contrastive Difference Predictive Coding· slideslive

Taxonomy

TopicsData Stream Mining Techniques · Mental Health Research Topics · Reinforcement Learning in Robotics

MethodsInfoNCE · Contrastive Predictive Coding