Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning

Qingjun Wang; Hongtu Zhou; Hang Yu; Junqiao Zhao; Yanping Zhao; Chen Ye; Ziqiao Wang; Guang Chen

arXiv:2605.08202·cs.LG·May 12, 2026

Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning

Qingjun Wang, Hongtu Zhou, Hang Yu, Junqiao Zhao, Yanping Zhao, Chen Ye, Ziqiao Wang, Guang Chen

PDF

1 Video

TL;DR

DOSER introduces a diffusion-based framework for more accurate OOD detection and selective regularization in offline RL, improving exploration and performance over existing methods.

Contribution

It proposes a novel diffusion-based approach that better distinguishes beneficial and risky OOD actions, surpassing prior methods with restrictive assumptions.

Findings

01

DOSER outperforms prior methods on offline RL benchmarks.

02

It effectively suppresses risky OOD actions while promoting high-potential exploration.

03

Theoretically, DOSER has a unique fixed point with bounded value estimates.

Abstract

Offline reinforcement learning (RL) faces a critical challenge of overestimating the value of out-of-distribution (OOD) actions. Existing methods mitigate this issue by penalizing unseen samples, yet they fail to accurately identify OOD actions and may suppress beneficial exploration beyond the behavioral support. Although several methods have been proposed to differentiate OOD samples with distinct properties, they typically rely on restrictive assumptions about the data distribution and remain limited in discrimination ability. To address this problem, we propose DOSER (Diffusion-based OOD Detection and Selective Regularization), a novel framework that goes beyond uniform penalization. DOSER trains two diffusion models to capture the behavior policy and state distribution, using single-step denoising reconstruction error as a reliable OOD indicator. During policy optimization, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning· slideslive