Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning
Mingqi Yuan, Bo Li, Xin Jin, Wenjun Zeng

TL;DR
This paper introduces REVD, a computationally efficient intrinsic reward method based on visitation discrepancy, which enhances exploration and sample efficiency in reinforcement learning without complex representations.
Contribution
The paper proposes REVD, a novel visitation discrepancy-based intrinsic reward method that is simple, efficient, and effective for exploration in reinforcement learning.
Findings
REVD significantly improves sample efficiency in Atari and robotics environments.
REVD outperforms existing exploration methods in benchmark tests.
REVD requires less computational complexity than prior approaches.
Abstract
Exploration is critical for deep reinforcement learning in complex environments with high-dimensional observations and sparse rewards. To address this problem, recent approaches proposed to leverage intrinsic rewards to improve exploration, such as novelty-based exploration and prediction-based exploration. However, many intrinsic reward modules require sophisticated structures and representation learning, resulting in prohibitive computational complexity and unstable performance. In this paper, we propose Rewarding Episodic Visitation Discrepancy (REVD), a computation-efficient and quantified exploration method. More specifically, REVD provides intrinsic rewards by evaluating the R\'enyi divergence-based visitation discrepancy between episodes. To make efficient divergence estimation, a k-nearest neighbor estimator is utilized with a randomly-initialized state encoder. Finally, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Explainable Artificial Intelligence (XAI)
