TL;DR
DOSER introduces a diffusion-based framework for more accurate OOD detection and selective regularization in offline RL, improving exploration and performance over existing methods.
Contribution
It proposes a novel diffusion-based approach that better distinguishes beneficial and risky OOD actions, surpassing prior methods with restrictive assumptions.
Findings
DOSER outperforms prior methods on offline RL benchmarks.
It effectively suppresses risky OOD actions while promoting high-potential exploration.
Theoretically, DOSER has a unique fixed point with bounded value estimates.
Abstract
Offline reinforcement learning (RL) faces a critical challenge of overestimating the value of out-of-distribution (OOD) actions. Existing methods mitigate this issue by penalizing unseen samples, yet they fail to accurately identify OOD actions and may suppress beneficial exploration beyond the behavioral support. Although several methods have been proposed to differentiate OOD samples with distinct properties, they typically rely on restrictive assumptions about the data distribution and remain limited in discrimination ability. To address this problem, we propose DOSER (Diffusion-based OOD Detection and Selective Regularization), a novel framework that goes beyond uniform penalization. DOSER trains two diffusion models to capture the behavior policy and state distribution, using single-step denoising reconstruction error as a reliable OOD indicator. During policy optimization, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
