Dataset Distillation for Offline Reinforcement Learning

Jonathan Light; Yuanzhe Liu; Ziniu Hu

arXiv:2407.20299·cs.LG·November 4, 2025

Dataset Distillation for Offline Reinforcement Learning

Jonathan Light, Yuanzhe Liu, Ziniu Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a data distillation method for offline reinforcement learning that synthesizes improved datasets, enabling better policy training without requiring access to large or high-quality original datasets.

Contribution

The paper presents a novel data distillation approach tailored for offline reinforcement learning to generate effective training datasets from limited or suboptimal data.

Findings

01

Synthesized datasets enable training policies with performance comparable to using full datasets.

02

The method improves policy training efficiency and effectiveness in offline RL.

03

Implementation and project site are publicly available for reproducibility.

Abstract

Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at https://datasetdistillation4rl.github.io . We also provide our implementation at https://github.com/ggflow123/DDRL .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ggflow123/ddrl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics