Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Zijian Gao, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao,, Huaimin Wang

TL;DR
This paper introduces a novel intrinsic reward mechanism for reinforcement learning that leverages temporal inconsistency in self-supervised models, inspired by human curiosity, to improve learning under sparse reward conditions.
Contribution
The paper proposes a new intrinsic reward based on temporal inconsistency of self-supervised predictions, with a variational weighting mechanism, outperforming existing methods without extra training costs.
Findings
Outperforms other intrinsic reward methods on benchmarks
Demonstrates higher noise tolerance
Requires no additional training costs
Abstract
Under sparse extrinsic reward settings, reinforcement learning has remained challenging, despite surging interests in this field. Previous attempts suggest that intrinsic reward can alleviate the issue caused by sparsity. In this article, we present a novel intrinsic reward that is inspired by human learning, as humans evaluate curiosity by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards. We also propose a variational weighting mechanism to assign weight to different snapshots in an adaptive manner. Our experimental results on various benchmark environments demonstrate the efficacy of our method, which outperforms other intrinsic reward-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural dynamics and brain function
