Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning
Yunchang Ma, Tenglong Liu, Yixing Lan, Xin Yin, Changxin Zhang, Xinglong Zhang, Xin Xu

TL;DR
DIVO introduces a value-conditional diffusion approach for offline reinforcement learning, balancing conservatism and exploration by selectively generating and filtering actions based on advantage values, leading to improved performance on benchmarks.
Contribution
The paper proposes DIVO, a novel diffusion-based offline RL method that uses advantage-guided training and filtering to enhance policy quality and dataset coverage.
Findings
DIVO outperforms state-of-the-art methods on D4RL benchmarks.
DIVO achieves significant improvements in average returns on locomotion tasks.
DIVO excels in the challenging AntMaze domain with sparse rewards.
Abstract
In offline reinforcement learning, value overestimation caused by out-of-distribution (OOD) actions significantly limits policy performance. Recently, diffusion models have been leveraged for their strong distribution-matching capabilities, enforcing conservatism through behavior policy constraints. However, existing methods often apply indiscriminate regularization to redundant actions in low-quality datasets, resulting in excessive conservatism and an imbalance between the expressiveness and efficiency of diffusion modeling. To address these issues, we propose DIffusion policies with Value-conditional Optimization (DIVO), a novel approach that leverages diffusion models to generate high-quality, broadly covered in-distribution state-action samples while facilitating efficient policy improvement. Specifically, DIVO introduces a binary-weighted mechanism that utilizes the advantage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
