Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

Suzan Ece Ada; Erhan Oztop; Emre Ugur

arXiv:2307.04726·cs.LG·June 9, 2025

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

Suzan Ece Ada, Erhan Oztop, Emre Ugur

PDF

Open Access

TL;DR

This paper introduces SRDP, a diffusion policy method with state reconstruction that enhances out-of-distribution generalization in offline RL, demonstrating superior performance on benchmarks and real-world tasks.

Contribution

The paper proposes a novel state reconstruction technique integrated with diffusion policies to improve OOD generalization in offline RL.

Findings

01

SRDP achieves state-of-the-art results on D4RL benchmarks.

02

SRDP demonstrates significant improvement in OOD navigation tasks.

03

Ablation studies confirm the importance of state reconstruction in SRDP.

Abstract

Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. However, they face challenges handling distribution shifts due to the lack of online interaction during training. To this end, we propose a novel method named State Reconstruction for Diffusion Policies (SRDP) that incorporates state reconstruction feature learning in the recent class of diffusion policies to address the problem of out-of-distribution (OOD) generalization. Our method promotes learning of generalizable state representation to alleviate the distribution shift caused by OOD states. To illustrate the OOD generalization and faster convergence of SRDP, we design a novel 2D Multimodal Contextual Bandit environment and realize it on a 6-DoF real-world UR10 robot, as well as in simulation, and compare its performance with prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDiffusion