Long-Term Visitation Value for Deep Exploration in Sparse Reward   Reinforcement Learning

Simone Parisi; Davide Tateo; Maximilian Hensel; Carlo D'Eramo; Jan; Peters; Joni Pajarinen

arXiv:2001.00119·cs.LG·March 4, 2022

Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning

Simone Parisi, Davide Tateo, Maximilian Hensel, Carlo D'Eramo, Jan, Peters, Joni Pajarinen

PDF

1 Repo

TL;DR

This paper introduces a novel model-free, off-policy reinforcement learning method that uses long-term visitation counts to improve exploration in environments with sparse rewards, outperforming existing methods especially with suboptimal reward modes.

Contribution

It proposes a new exploration strategy based on long-term visitation values and decouples exploration from exploitation, along with new benchmarks for evaluation.

Findings

01

Outperforms existing exploration methods in sparse reward environments

02

Scales well with environment size

03

Effective in environments with suboptimal reward modes

Abstract

Reinforcement learning with sparse rewards is still an open challenge. Classic methods rely on getting feedback via extrinsic rewards to train the agent, and in situations where this occurs very rarely the agent learns slowly or cannot learn at all. Similarly, if the agent receives also rewards that create suboptimal modes of the objective function, it will likely prematurely stop exploring. More recent methods add auxiliary intrinsic rewards to encourage exploration. However, auxiliary rewards lead to a non-stationary target for the Q-function. In this paper, we present a novel approach that (1) plans exploration actions far into the future by using a long-term visitation count, and (2) decouples exploration and exploitation by learning a separate function assessing the exploration value of the actions. Contrary to existing methods which use models of reward and dynamics, our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sparisi/visit-value-explore
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.