Self-supervised network distillation: an effective approach to   exploration in sparse reward environments

Matej Pech\'a\v{c}; Michal Chovanec; Igor Farka\v{s}

arXiv:2302.11563·cs.AI·June 12, 2024

Self-supervised network distillation: an effective approach to exploration in sparse reward environments

Matej Pech\'a\v{c}, Michal Chovanec, Igor Farka\v{s}

PDF

Open Access 2 Repos

TL;DR

This paper introduces Self-supervised Network Distillation (SND), a novel intrinsic motivation method for reinforcement learning that enhances exploration in sparse reward environments by using distillation error as a novelty signal.

Contribution

The paper proposes SND, a new intrinsic motivation approach based on self-supervised distillation error, improving exploration efficiency in challenging sparse reward settings.

Findings

01

SND achieves faster reward accumulation compared to baselines.

02

The approach improves exploration in environments with sparse rewards.

03

Analytical methods offer insights into model behavior.

Abstract

Reinforcement learning can solve decision-making problems and train an agent to behave in an environment according to a predesigned reward function. However, such an approach becomes very problematic if the reward is too sparse and so the agent does not come across the reward during the environmental exploration. The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration during which the agent is likely to also encounter external reward. Novelty detection is one of the promising branches of intrinsic motivation research. We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator, where the predictor model and the target model are both trained. We adapted three existing self-supervised methods for this purpose and experimentally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Algorithms and Applications