Discovering Intrinsic Reward with Contrastive Random Walk
Zixuan Pan, Zihao Wei, Yidong Huang, Aditya Gupta

TL;DR
This paper introduces Contrastive Random Walk, a novel intrinsic reward method that enhances learning efficiency and robustness in sparse reward environments by learning meaningful state representations.
Contribution
It proposes a new curiosity-driven approach using contrastive random walks to improve convergence speed and robustness in reinforcement learning.
Findings
Outperforms other methods in sparse reward scenarios.
Provides high reward within fewer iterations.
Shows robustness to environment initialization.
Abstract
The aim of this paper is to demonstrate the efficacy of using Contrastive Random Walk as a curiosity method to achieve faster convergence to the optimal policy.Contrastive Random Walk defines the transition matrix of a random walk with the help of neural networks. It learns a meaningful state representation with a closed loop. The loss of Contrastive Random Walk serves as an intrinsic reward and is added to the environment reward. Our method works well in non-tabular sparse reward scenarios, in the sense that our method receives the highest reward within the same iterations compared to other methods. Meanwhile, Contrastive Random Walk is more robust. The performance doesn't change much with different random initialization of environments. We also find that adaptive restart and appropriate temperature are crucial to the performance of Contrastive Random Walk.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Neural dynamics and brain function · Neural Networks and Applications
