From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning

Gaurav Chaudhary; Laxmidhar Behera

arXiv:2507.12815·cs.LG·December 23, 2025

From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning

Gaurav Chaudhary, Laxmidhar Behera

PDF

Open Access

TL;DR

ReLOAD introduces a simple, effective method to generate intrinsic rewards from expert demonstrations using Random Network Distillation, enabling offline RL without explicit reward annotations and achieving competitive performance.

Contribution

The paper presents ReLOAD, a novel reward annotation framework that leverages RND for intrinsic reward generation in offline RL, eliminating the need for handcrafted reward signals.

Findings

01

ReLOAD achieves competitive performance on D4RL benchmarks.

02

The method effectively distinguishes expert-like transitions using prediction errors.

03

ReLOAD simplifies reward annotation in offline RL without complex alignment procedures.

Abstract

Offline Reinforcement Learning (RL) aims to learn effective policies from a static dataset without requiring further agent-environment interactions. However, its practical adoption is often hindered by the need for explicit reward annotations, which can be costly to engineer or difficult to obtain retrospectively. To address this, we propose ReLOAD (Reinforcement Learning with Offline Reward Annotation via Distillation), a novel reward annotation framework for offline RL. Unlike existing methods that depend on complex alignment procedures, our approach adapts Random Network Distillation (RND) to generate intrinsic rewards from expert demonstrations using a simple yet effective embedding discrepancy measure. First, we train a predictor network to mimic a fixed target network's embeddings based on expert state transitions. Later, the prediction error between these networks serves as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics