Episodic Novelty Through Temporal Distance
Yuhua Jiang, Qihan Liu, Yiqin Yang, Xiaoteng Ma, Dianyu Zhong, Hao Hu,, Jun Yang, Bin Liang, Bo Xu, Chongjie Zhang, Qianchuan Zhao

TL;DR
This paper introduces ETD, a new method using temporal distance and contrastive learning to improve exploration in sparse reward environments with varying contexts, outperforming existing approaches.
Contribution
The paper presents a novel temporal distance metric and contrastive learning framework for intrinsic motivation in CMDPs, addressing limitations of count-based and similarity-based methods.
Findings
ETD outperforms state-of-the-art exploration methods on benchmark tasks.
Temporal distance effectively captures state novelty in sparse reward environments.
Contrastive learning enhances the accuracy of temporal distance estimation.
Abstract
Exploration in sparse reward environments remains a significant challenge in reinforcement learning, particularly in Contextual Markov Decision Processes (CMDPs), where environments differ across episodes. Existing episodic intrinsic motivation methods for CMDPs primarily rely on count-based approaches, which are ineffective in large state spaces, or on similarity-based methods that lack appropriate metrics for state comparison. To address these shortcomings, we propose Episodic Novelty Through Temporal Distance (ETD), a novel approach that introduces temporal distance as a robust metric for state similarity and intrinsic reward computation. By employing contrastive learning, ETD accurately estimates temporal distances and derives intrinsic rewards based on the novelty of states within the current episode. Extensive experiments on various benchmark tasks demonstrate that ETD…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper introduces a novel method to use temporal distance as a (quasi)metric for state similarity. 2. The paper conducts extensive experiments across multiple CMDP environments, comparing ETD to several baseline methods. 3. ETD + PPO demonstrates robust performance improvements, especially in challenging sparse reward scenarios. 4. Results on extensive experiments across multiple CMDP environments, comparing ETD to several baseline methods have been reported. 5. The paper is well-structure
1. The proposed ETD doesn’t take into consideration extrinsic rewards to compute similarity. Intuitively, states with similar rewards could be considered similar in terms of the task objective [1]. 2. The approach has been primarily tested on discrete action spaces, and its effectiveness in continuous action domains such as MuJoCo [2], DeepMind Control Suite [3], or Fetch [4] environments remains unexplored. [1] Agarwal, Rishabh, et. al. “Contrastive behavioral similarity embeddings for general
- The paper is well-written, with a clear and cohesive narrative. Most technical details are effectively conveyed through illustrative figures and results from intuitive toy tasks. - The experimental tasks are appropriately chosen, providing sufficient complexity to evaluate the approach. - The experimental results, along with ablation studies, clearly demonstrate the advantages of the proposed method over other baselines. - The paper offers a comprehensive analysis and comparison of different t
## MDP assumption The definition of the intrinsic bonus reward violates the MDP assumption. In section 2, the total reward $r(s_t, a_t, s_{t+1})$ is a decomposed as the environment reward $r_t^e$ plus the weighted bonus $\beta b_t$. However, $b_t$ is a function that depends on the visited states within the episode, which disrupts the definition of total reward and violates the MDP assumption. In this case, the visited states influences the intrinsic reward, potentially harming the policy built u
- A well motivated new intrinsic reward signal for RL - The concept is simple and provides a strong performance - Exhaustive evaluations, good comparisons the baseline methods as well as nice ablations - Good insights why baseline methods fail
the paper is already of high quality and I could not find major weaknesses. Minor weaknesses are given in the questions section.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
