Steering Your Diffusion Policy with Latent Space Reinforcement Learning

Andrew Wagenmaker; Mitsuhiko Nakamoto; Yunchu Zhang; Seohong Park; Waleed Yagoub; Anusha Nagabandi; Abhishek Gupta; Sergey Levine

arXiv:2506.15799·cs.RO·June 27, 2025

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Nagabandi, Abhishek Gupta, Sergey Levine

PDF

Open Access

TL;DR

This paper introduces DSRL, a method for efficiently adapting diffusion-based robotic control policies through reinforcement learning in the latent space, enabling fast autonomous policy improvement without modifying the original policy weights.

Contribution

The paper presents diffusion steering via reinforcement learning (DSRL), a novel approach that enables sample-efficient, real-world policy adaptation by operating over the latent noise space of diffusion policies.

Findings

01

DSRL achieves high sample efficiency in simulated and real-world tasks.

02

It enables autonomous policy improvement without modifying base policy weights.

03

Demonstrates effectiveness in adapting pretrained generalist policies.

Abstract

Robotic control policies learned from human demonstrations have achieved impressive results in many real-world applications. However, in scenarios where initial performance is not satisfactory, as is often the case in novel open-world settings, such behavioral cloning (BC)-learned policies typically require collecting additional human demonstrations to further improve their behavior -- an expensive and time-consuming process. In contrast, reinforcement learning (RL) holds the promise of enabling autonomous online policy improvement, but often falls short of achieving this due to the large number of samples it typically requires. In this work we take steps towards enabling fast autonomous adaptation of BC-trained policies via efficient real-world RL. Focusing in particular on diffusion policies -- a state-of-the-art BC methodology -- we propose diffusion steering via reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic Prediction and Management Techniques

MethodsDiffusion · Balanced Selection