Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement   Learning

Xu-Hui Liu; Tian-Shuo Liu; Shengyi Jiang; Ruifeng Chen; Zhilong Zhang,; Xinwei Chen; Yang Yu

arXiv:2407.12448·cs.LG·September 5, 2024

Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang,, Xinwei Chen, Yang Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces EDIS, a diffusion-based method that leverages offline data to improve online reinforcement learning, reducing data shift issues and enhancing performance in various environments.

Contribution

EDIS is a novel diffusion sampling approach that distills offline knowledge for better online data generation, compatible with existing offline-to-online RL methods.

Findings

01

EDIS achieves a 20% average performance boost on MuJoCo, AntMaze, and Adroit environments.

02

Theoretical analysis shows EDIS reduces suboptimality compared to traditional offline or online data reuse.

03

EDIS can be integrated with methods like Cal-QL and IQL as a plug-in enhancement.

Abstract

Combining offline and online reinforcement learning (RL) techniques is indeed crucial for achieving efficient and safe learning where data acquisition is expensive. Existing methods replay offline data directly in the online phase, resulting in a significant challenge of data distribution shift and subsequently causing inefficiency in online fine-tuning. To address this issue, we introduce an innovative approach, \textbf{E}nergy-guided \textbf{DI}ffusion \textbf{S}ampling (EDIS), which utilizes a diffusion model to extract prior knowledge from the offline dataset and employs energy functions to distill this knowledge for enhanced data generation in the online phase. The theoretical analysis demonstrates that EDIS exhibits reduced suboptimality compared to solely utilizing online data or directly reusing offline data. EDIS is a plug-in approach and can be combined with existing methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuxhym/edis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems

MethodsDiffusion · Implicit Q-Learning