Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation
Xiao Huang, Xu Liu, Enze Zhang, Tong Yu, Shuai Li

TL;DR
This paper introduces CFDG, a classifier-free diffusion-based data augmentation method that improves offline-to-online reinforcement learning by generating more aligned data, leading to significant performance gains on benchmarks.
Contribution
Proposes CFDG, a novel classifier-free diffusion approach for data augmentation in offline-to-online RL, enhancing data quality without extra classifier training.
Findings
CFDG outperforms standard data generation methods.
Achieves 15% average performance improvement on D4RL benchmarks.
Easily integrates with existing offline-to-online RL algorithms.
Abstract
Offline-to-online Reinforcement Learning (O2O RL) aims to perform online fine-tuning on an offline pre-trained policy to minimize costly online interactions. Existing work used offline datasets to generate data that conform to the online data distribution for data augmentation. However, generated data still exhibits a gap with the online data, limiting overall performance. To address this, we propose a new data augmentation approach, Classifier-Free Diffusion Generation (CFDG). Without introducing additional classifier training overhead, CFDG leverages classifier-free guidance diffusion to significantly enhance the generation quality of offline and online data with different distributions. Additionally, it employs a reweighting method to enable more generated data to align with the online data, enhancing performance while maintaining the agent's stability. Experimental results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adaptive Dynamic Programming Control
