Diffusion Self-Weighted Guidance for Offline Reinforcement Learning

Augusto Tagle; Javier Ruiz-del-Solar; Felipe Tobar

arXiv:2505.18345·cs.LG·December 24, 2025

Diffusion Self-Weighted Guidance for Offline Reinforcement Learning

Augusto Tagle, Javier Ruiz-del-Solar, Felipe Tobar

PDF

Open Access

TL;DR

This paper introduces Self-Weighted Guidance (SWG), a diffusion-based method for offline reinforcement learning that simplifies score computation by integrating weights directly into the diffusion model, achieving competitive results.

Contribution

The paper proposes a novel diffusion model framework that directly incorporates weight functions for offline RL, eliminating the need for additional network training.

Findings

01

SWG generates desired action distributions in toy examples.

02

SWG performs comparably to state-of-the-art on D4RL benchmarks.

03

Ablation studies validate the scalability and effectiveness of SWG.

Abstract

Offline reinforcement learning (RL) recovers the optimal policy $π$ given historical observations of an agent. In practice, $π$ is modeled as a weighted version of the agent's behavior policy $μ$ , using a weight function $w$ working as a critic of the agent's behavior. Though recent approaches to offline RL based on diffusion models have exhibited promising results, the computation of the required scores is challenging due to their dependence on the unknown $w$ . In this work, we alleviate this issue by constructing a diffusion over both the actions and the weights. With the proposed setting, the required scores are directly obtained from the diffusion model without learning extra networks. Our main conceptual contribution is a novel guidance method, where guidance (which is a function of $w$ ) comes from the same diffusion model, therefore, our proposal is termed Self-Weighted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control

MethodsDiffusion