TL;DR
D2PPO introduces dispersive loss regularization to diffusion policies, improving their ability to distinguish subtle differences in observations for complex robotic manipulation tasks.
Contribution
The paper proposes D2PPO, a novel regularization method that prevents representation collapse in diffusion policies, enhancing performance on complex manipulation tasks.
Findings
D2PPO achieves a 22.7% improvement in pre-training on RoboMimic.
D2PPO achieves a 26.1% improvement after fine-tuning.
Real-world experiments show high success rates, especially in complex tasks.
Abstract
Diffusion policies excel at robotic manipulation by naturally modeling multimodal action distributions in high-dimensional spaces. Nevertheless, diffusion policies suffer from diffusion representation collapse: semantically similar observations are mapped to indistinguishable features, ultimately impairing their ability to handle subtle but critical variations required for complex robotic manipulation. To address this problem, we propose D2PPO (Diffusion Policy Policy Optimization with Dispersive Loss). D2PPO introduces dispersive loss regularization that combats representation collapse by treating all hidden representations within each batch as negative pairs. D2PPO compels the network to learn discriminative representations of similar observations, thereby enabling the policy to identify subtle yet crucial differences necessary for precise manipulation. In evaluation, we find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
