DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation
Jaehyun Park, Yunho Kim, Sejin Kim, Byung-Jun Lee, Sundong, Kim

TL;DR
DIAR is a novel offline reinforcement learning framework that uses diffusion models and adaptive revaluation to improve decision-making, robustness, and generalization in long-horizon, sparse-reward tasks.
Contribution
The paper introduces DIAR, integrating diffusion models with implicit Q-learning and adaptive revaluation for enhanced offline RL performance.
Findings
Outperforms state-of-the-art algorithms in Maze2D, AntMaze, and Kitchen tasks.
Effectively handles out-of-distribution samples and long-horizon problems.
Improves policy robustness and generalization through diverse latent trajectories.
Abstract
We propose a novel offline reinforcement learning (offline RL) approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation (DIAR) framework. We address two key challenges in offline RL: out-of-distribution samples and long-horizon problems. We leverage diffusion models to learn state-action sequence distributions and incorporate value functions for more balanced and adaptive decision-making. DIAR introduces an Adaptive Revaluation mechanism that dynamically adjusts decision lengths by comparing current and future state values, enabling flexible long-term decision-making. Furthermore, we address Q-value overestimation by combining Q-network learning with a value function guided by a diffusion model. The diffusion model generates diverse latent trajectories, enhancing policy robustness and generalization. As demonstrated in tasks like Maze2D, AntMaze,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Neural Networks and Applications
MethodsDiffusion · Q-Learning
