Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning
Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang,, Xinwei Chen, Yang Yu

TL;DR
This paper introduces EDIS, a diffusion-based method that leverages offline data to improve online reinforcement learning, reducing data shift issues and enhancing performance in various environments.
Contribution
EDIS is a novel diffusion sampling approach that distills offline knowledge for better online data generation, compatible with existing offline-to-online RL methods.
Findings
EDIS achieves a 20% average performance boost on MuJoCo, AntMaze, and Adroit environments.
Theoretical analysis shows EDIS reduces suboptimality compared to traditional offline or online data reuse.
EDIS can be integrated with methods like Cal-QL and IQL as a plug-in enhancement.
Abstract
Combining offline and online reinforcement learning (RL) techniques is indeed crucial for achieving efficient and safe learning where data acquisition is expensive. Existing methods replay offline data directly in the online phase, resulting in a significant challenge of data distribution shift and subsequently causing inefficiency in online fine-tuning. To address this issue, we introduce an innovative approach, \textbf{E}nergy-guided \textbf{DI}ffusion \textbf{S}ampling (EDIS), which utilizes a diffusion model to extract prior knowledge from the offline dataset and employs energy functions to distill this knowledge for enhanced data generation in the online phase. The theoretical analysis demonstrates that EDIS exhibits reduced suboptimality compared to solely utilizing online data or directly reusing offline data. EDIS is a plug-in approach and can be combined with existing methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
MethodsDiffusion · Implicit Q-Learning
