Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches
Yutong Hu, Pinhao Song, Kehan Wen, Renaud Detry

TL;DR
Mini Diffuser introduces a two-level mini-batching technique for multi-task diffusion policies, significantly reducing training time and memory while maintaining high performance in robotic vision-language tasks.
Contribution
The paper proposes a novel two-level minibatching approach and architectural modifications to diffusion transformers, enabling efficient multi-task diffusion policy training.
Findings
Achieves 95% of state-of-the-art performance in RLBench simulations.
Uses only 5% of the training time compared to previous methods.
Requires just 7% of the memory of existing diffusion policy models.
Abstract
We present a method that reduces, by an order of magnitude, the time and memory needed to train multi-task vision-language robotic diffusion policies. This improvement arises from a previously underexplored distinction between action diffusion and the image diffusion techniques that inspired it: In image generation, the target is high-dimensional. By contrast, in action generation, the dimensionality of the target is comparatively small, and only the image condition is high-dimensional. Our approach, \emph{Mini Diffuser}, exploits this asymmetry by introducing \emph{two-level minibatching}, which pairs multiple noised action samples with each vision-language condition, instead of the conventional one-to-one sampling strategy. To support this batching scheme, we introduce architectural adaptations to the diffusion transformer that prevent information leakage across samples while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
