Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL:   Application to Non-Prehensile Manipulation

Huy Le; Tai Hoang; Miroslav Gabriel; Gerhard Neumann; and Ngo Anh Vien

arXiv:2411.14913·cs.RO·April 29, 2025

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Huy Le, Tai Hoang, Miroslav Gabriel, Gerhard Neumann, and Ngo Anh Vien

PDF

Open Access

TL;DR

This paper introduces HyDo, a hybrid reinforcement learning approach that uses diffusion models for continuous actions and maximizes entropy for exploration, significantly improving non-prehensile manipulation success rates in simulation and real-world tasks.

Contribution

It presents a novel hybrid framework combining diffusion policies with maximum entropy RL, enabling more diverse exploration in both discrete and continuous action spaces.

Findings

01

HyDo achieves higher success rates in manipulation tasks.

02

Diffusion-based policies promote diverse behaviors.

03

Significant improvement in real-world task performance.

Abstract

Learning diverse policies for non-prehensile manipulation is essential for improving skill transfer and generalization to out-of-distribution scenarios. In this work, we enhance exploration through a two-fold approach within a hybrid framework that tackles both discrete and continuous action spaces. First, we model the continuous motion parameter policy as a diffusion model, and second, we incorporate this into a maximum entropy reinforcement learning framework that unifies both the discrete and continuous components. The discrete action space, such as contact point selection, is optimized through Q-value function maximization, while the continuous part is guided by a diffusion-based policy. This hybrid approach leads to a principled objective, where the maximum entropy term is derived as a lower bound using structured variational inference. We propose the Hybrid Diffusion Policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExtremum Seeking Control Systems · Iterative Learning Control Systems · Neuroscience and Neural Engineering