Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation

Xiao Huang; Xu Liu; Enze Zhang; Tong Yu; Shuai Li

arXiv:2508.06806·cs.LG·August 12, 2025

Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation

Xiao Huang, Xu Liu, Enze Zhang, Tong Yu, Shuai Li

PDF

Open Access 1 Video

TL;DR

This paper introduces CFDG, a classifier-free diffusion-based data augmentation method that improves offline-to-online reinforcement learning by generating more aligned data, leading to significant performance gains on benchmarks.

Contribution

Proposes CFDG, a novel classifier-free diffusion approach for data augmentation in offline-to-online RL, enhancing data quality without extra classifier training.

Findings

01

CFDG outperforms standard data generation methods.

02

Achieves 15% average performance improvement on D4RL benchmarks.

03

Easily integrates with existing offline-to-online RL algorithms.

Abstract

Offline-to-online Reinforcement Learning (O2O RL) aims to perform online fine-tuning on an offline pre-trained policy to minimize costly online interactions. Existing work used offline datasets to generate data that conform to the online data distribution for data augmentation. However, generated data still exhibits a gap with the online data, limiting overall performance. To address this, we propose a new data augmentation approach, Classifier-Free Diffusion Generation (CFDG). Without introducing additional classifier training overhead, CFDG leverages classifier-free guidance diffusion to significantly enhance the generation quality of offline and online data with different distributions. Additionally, it employs a reweighting method to enable more generated data to align with the online data, enhancing performance while maintaining the agent's stability. Experimental results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adaptive Dynamic Programming Control