Exploration by Random Distribution Distillation

Zhirui Fang; Kai Yang; Jian Tao; Jiafei Lyu; Lusong Li; Li Shen; Xiu Li

arXiv:2505.11044·cs.LG·May 19, 2025

Exploration by Random Distribution Distillation

Zhirui Fang, Kai Yang, Jian Tao, Jiafei Lyu, Lusong Li, Li Shen, Xiu Li

PDF

Open Access

TL;DR

This paper introduces Random Distribution Distillation (RDD), a novel exploration method in reinforcement learning that combines count-based and prediction-error approaches through stochastic target network outputs, enhancing exploration efficiency.

Contribution

RDD is a new exploration technique that models target network outputs as samples from a normal distribution, unifying count-based and prediction-error methods in RL.

Findings

01

RDD improves exploration in high-dimensional spaces.

02

Experimental results show RDD outperforms existing methods.

03

Theoretical analysis confirms RDD's effectiveness.

Abstract

Exploration remains a critical challenge in online reinforcement learning, as an agent must effectively explore unknown environments to achieve high returns. Currently, the main exploration algorithms are primarily count-based methods and curiosity-based methods, with prediction-error methods being a prominent example. In this paper, we propose a novel method called \textbf{R}andom \textbf{D}istribution \textbf{D}istillation (RDD), which samples the output of a target network from a normal distribution. RDD facilitates a more extensive exploration by explicitly treating the difference between the prediction network and the target network as an intrinsic reward. Furthermore, by introducing randomness into the output of the target network for a given state and modeling it as a sample from a normal distribution, intrinsic rewards are bounded by two key components: a pseudo-count term…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques