Self-Supervised Exploration via Temporal Inconsistency in Reinforcement   Learning

Zijian Gao; Kele Xu; Yuanzhao Zhai; Dawei Feng; Bo Ding; XinJun Mao,; Huaimin Wang

arXiv:2208.11361·cs.LG·June 28, 2023·1 cites

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

Zijian Gao, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao,, Huaimin Wang

PDF

Open Access

TL;DR

This paper introduces a novel intrinsic reward mechanism for reinforcement learning that leverages temporal inconsistency in self-supervised models, inspired by human curiosity, to improve learning under sparse reward conditions.

Contribution

The paper proposes a new intrinsic reward based on temporal inconsistency of self-supervised predictions, with a variational weighting mechanism, outperforming existing methods without extra training costs.

Findings

01

Outperforms other intrinsic reward methods on benchmarks

02

Demonstrates higher noise tolerance

03

Requires no additional training costs

Abstract

Under sparse extrinsic reward settings, reinforcement learning has remained challenging, despite surging interests in this field. Previous attempts suggest that intrinsic reward can alleviate the issue caused by sparsity. In this article, we present a novel intrinsic reward that is inspired by human learning, as humans evaluate curiosity by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards. We also propose a variational weighting mechanism to assign weight to different snapshots in an adaptive manner. Our experimental results on various benchmark environments demonstrate the efficacy of our method, which outperforms other intrinsic reward-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural dynamics and brain function