Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
Jinhao Zhang, Zhexuan Zhou, Huizhe Li, Yichen Lai, Wenlong Xia, Haoming Song, Youmin Gong, Jie Mei

TL;DR
HDP3 introduces a frequency-aware, lightweight 3D diffusion policy for visuomotor control that achieves state-of-the-art results with fewer parameters and faster inference by leveraging low-frequency dominance in robot actions.
Contribution
The paper proposes a novel frequency-domain perspective for 3D diffusion policies, leading to a simplified, efficient denoising model and two-step inference that outperform prior methods.
Findings
HDP3 achieves state-of-the-art performance across multiple benchmarks.
Two-step denoising suffices for high-quality policy inference.
HDP3 uses less than 1% of the parameters of previous models.
Abstract
Diffusion-based visuomotor policies perform well in robotic manipulation, yet current methods still inherit image-generation-style decoders and multi-step sampling. We revisit this design from a frequency-domain perspective. Robot action trajectories are highly smooth, with most energy concentrated in a few low-frequency discrete cosine transform modes. Under this structure, we show that the error of the optimal denoiser is bounded by the low-frequency subspace dimension and residual high-frequency energy, implying that denoising error saturates after very few reverse steps. This also suggests that action denoising requires a much simpler denoising model than image generation. Motivated by this insight, we propose Hydra-DP3 (HDP3), a pocket-scale 3D diffusion policy with a lightweight Diffusion Mixer decoder that supports two-step DDIM inference. Our synthetic experiments validate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
