Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning
Suzan Ece Ada, Erhan Oztop, Emre Ugur

TL;DR
This paper introduces SRDP, a diffusion policy method with state reconstruction that enhances out-of-distribution generalization in offline RL, demonstrating superior performance on benchmarks and real-world tasks.
Contribution
The paper proposes a novel state reconstruction technique integrated with diffusion policies to improve OOD generalization in offline RL.
Findings
SRDP achieves state-of-the-art results on D4RL benchmarks.
SRDP demonstrates significant improvement in OOD navigation tasks.
Ablation studies confirm the importance of state reconstruction in SRDP.
Abstract
Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. However, they face challenges handling distribution shifts due to the lack of online interaction during training. To this end, we propose a novel method named State Reconstruction for Diffusion Policies (SRDP) that incorporates state reconstruction feature learning in the recent class of diffusion policies to address the problem of out-of-distribution (OOD) generalization. Our method promotes learning of generalizable state representation to alleviate the distribution shift caused by OOD states. To illustrate the OOD generalization and faster convergence of SRDP, we design a novel 2D Multimodal Contextual Bandit environment and realize it on a 6-DoF real-world UR10 robot, as well as in simulation, and compare its performance with prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsDiffusion
