Self-supervised network distillation: an effective approach to exploration in sparse reward environments
Matej Pech\'a\v{c}, Michal Chovanec, Igor Farka\v{s}

TL;DR
This paper introduces Self-supervised Network Distillation (SND), a novel intrinsic motivation method for reinforcement learning that enhances exploration in sparse reward environments by using distillation error as a novelty signal.
Contribution
The paper proposes SND, a new intrinsic motivation approach based on self-supervised distillation error, improving exploration efficiency in challenging sparse reward settings.
Findings
SND achieves faster reward accumulation compared to baselines.
The approach improves exploration in environments with sparse rewards.
Analytical methods offer insights into model behavior.
Abstract
Reinforcement learning can solve decision-making problems and train an agent to behave in an environment according to a predesigned reward function. However, such an approach becomes very problematic if the reward is too sparse and so the agent does not come across the reward during the environmental exploration. The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration during which the agent is likely to also encounter external reward. Novelty detection is one of the promising branches of intrinsic motivation research. We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator, where the predictor model and the target model are both trained. We adapted three existing self-supervised methods for this purpose and experimentally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Algorithms and Applications
