From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning
Gaurav Chaudhary, Laxmidhar Behera

TL;DR
ReLOAD introduces a simple, effective method to generate intrinsic rewards from expert demonstrations using Random Network Distillation, enabling offline RL without explicit reward annotations and achieving competitive performance.
Contribution
The paper presents ReLOAD, a novel reward annotation framework that leverages RND for intrinsic reward generation in offline RL, eliminating the need for handcrafted reward signals.
Findings
ReLOAD achieves competitive performance on D4RL benchmarks.
The method effectively distinguishes expert-like transitions using prediction errors.
ReLOAD simplifies reward annotation in offline RL without complex alignment procedures.
Abstract
Offline Reinforcement Learning (RL) aims to learn effective policies from a static dataset without requiring further agent-environment interactions. However, its practical adoption is often hindered by the need for explicit reward annotations, which can be costly to engineer or difficult to obtain retrospectively. To address this, we propose ReLOAD (Reinforcement Learning with Offline Reward Annotation via Distillation), a novel reward annotation framework for offline RL. Unlike existing methods that depend on complex alignment procedures, our approach adapts Random Network Distillation (RND) to generate intrinsic rewards from expert demonstrations using a simple yet effective embedding discrepancy measure. First, we train a predictor network to mimic a fixed target network's embeddings based on expert state transitions. Later, the prediction error between these networks serves as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
