Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning
Liyuan Mao, Haoran Xu, Xianyuan Zhan, Weinan Zhang, Amy Zhang

TL;DR
Diffusion-DICE introduces a novel diffusion model-based approach for offline reinforcement learning, transforming behavior distributions into optimal policies with in-sample data, avoiding value function errors and improving performance.
Contribution
The paper presents Diffusion-DICE, a new method that directly transforms behavior distributions into optimal policies using diffusion models and an in-sample learning objective.
Findings
Successfully avoids value function error exploitation.
Achieves strong performance on benchmark datasets.
Effectively handles multi-modality in policy distributions.
Abstract
One important property of DIstribution Correction Estimation (DICE) methods is that the solution is the optimal stationary distribution ratio between the optimized and data collection policy. In this work, we show that DICE-based methods can be viewed as a transformation from the behavior distribution to the optimal policy distribution. Based on this, we propose a novel approach, Diffusion-DICE, that directly performs this transformation using diffusion models. We find that the optimal policy's score function can be decomposed into two terms: the behavior policy's score function and the gradient of a guidance term which depends on the optimal distribution ratio. The first term can be obtained from a diffusion model trained on the dataset and we propose an in-sample learning objective to learn the second term. Due to the multi-modality contained in the optimal policy distribution, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsDiffusion
