Steering Your Diffusion Policy with Latent Space Reinforcement Learning
Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Nagabandi, Abhishek Gupta, Sergey Levine

TL;DR
This paper introduces DSRL, a method for efficiently adapting diffusion-based robotic control policies through reinforcement learning in the latent space, enabling fast autonomous policy improvement without modifying the original policy weights.
Contribution
The paper presents diffusion steering via reinforcement learning (DSRL), a novel approach that enables sample-efficient, real-world policy adaptation by operating over the latent noise space of diffusion policies.
Findings
DSRL achieves high sample efficiency in simulated and real-world tasks.
It enables autonomous policy improvement without modifying base policy weights.
Demonstrates effectiveness in adapting pretrained generalist policies.
Abstract
Robotic control policies learned from human demonstrations have achieved impressive results in many real-world applications. However, in scenarios where initial performance is not satisfactory, as is often the case in novel open-world settings, such behavioral cloning (BC)-learned policies typically require collecting additional human demonstrations to further improve their behavior -- an expensive and time-consuming process. In contrast, reinforcement learning (RL) holds the promise of enabling autonomous online policy improvement, but often falls short of achieving this due to the large number of samples it typically requires. In this work we take steps towards enabling fast autonomous adaptation of BC-trained policies via efficient real-world RL. Focusing in particular on diffusion policies -- a state-of-the-art BC methodology -- we propose diffusion steering via reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques
MethodsDiffusion · Balanced Selection
