Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement   Learning

Liyuan Mao; Haoran Xu; Xianyuan Zhan; Weinan Zhang; Amy Zhang

arXiv:2407.20109·cs.LG·November 1, 2024

Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning

Liyuan Mao, Haoran Xu, Xianyuan Zhan, Weinan Zhang, Amy Zhang

PDF

Open Access

TL;DR

Diffusion-DICE introduces a novel diffusion model-based approach for offline reinforcement learning, transforming behavior distributions into optimal policies with in-sample data, avoiding value function errors and improving performance.

Contribution

The paper presents Diffusion-DICE, a new method that directly transforms behavior distributions into optimal policies using diffusion models and an in-sample learning objective.

Findings

01

Successfully avoids value function error exploitation.

02

Achieves strong performance on benchmark datasets.

03

Effectively handles multi-modality in policy distributions.

Abstract

One important property of DIstribution Correction Estimation (DICE) methods is that the solution is the optimal stationary distribution ratio between the optimized and data collection policy. In this work, we show that DICE-based methods can be viewed as a transformation from the behavior distribution to the optimal policy distribution. Based on this, we propose a novel approach, Diffusion-DICE, that directly performs this transformation using diffusion models. We find that the optimal policy's score function can be decomposed into two terms: the behavior policy's score function and the gradient of a guidance term which depends on the optimal distribution ratio. The first term can be obtained from a diffusion model trained on the dataset and we propose an in-sample learning objective to learn the second term. Due to the multi-modality contained in the optimal policy distribution, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDiffusion