H$^3$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning

Yiyang Lu; Yufeng Tian; Zhecheng Yuan; Xianbang Wang; Pu Hua; Zhengrong Xue; Huazhe Xu

arXiv:2505.07819·cs.RO·June 18, 2025

H$^3$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning

Yiyang Lu, Yufeng Tian, Zhecheng Yuan, Xianbang Wang, Pu Hua, Zhengrong Xue, Huazhe Xu

PDF

Open Access

TL;DR

H$^3$DP introduces a triply-hierarchical diffusion framework that enhances visuomotor learning by explicitly integrating multi-level visual features with action generation, leading to significant improvements in robotic manipulation tasks.

Contribution

The paper proposes a novel triply-hierarchical diffusion policy that explicitly models hierarchical visual features and their coupling with action generation in visuomotor learning.

Findings

01

Achieves +27.5% improvement over baselines in simulation tasks.

02

Outperforms in 4 challenging real-world bimanual manipulation tasks.

03

Effectively integrates multi-scale visual features with diffusion-based action generation.

Abstract

Visuomotor policy learning has witnessed substantial progress in robotic manipulation, with recent approaches predominantly relying on generative models to model the action distribution. However, these methods often overlook the critical coupling between visual perception and action prediction. In this work, we introduce $\textbf{Triply-Hierarchical Diffusion Policy}~(\textbf{H$ ^{\mathbf{3}} $DP})$ , a novel visuomotor learning framework that explicitly incorporates hierarchical structures to strengthen the integration between visual features and action generation. H $^{3}$ DP contains $3$ levels of hierarchy: (1) depth-aware input layering that organizes RGB-D observations based on depth information; (2) multi-scale visual representations that encode semantic features at varying levels of granularity; and (3) a hierarchically conditioned diffusion process that aligns the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Applications

MethodsDiffusion