DNAct: Diffusion Guided Multi-Task 3D Policy Learning
Ge Yan, Yueh-Hua Wu, Xiaolong Wang

TL;DR
DNAct is a multi-task 3D policy learning framework that combines neural rendering and diffusion training to enhance semantic understanding and robustness in robotic tasks, achieving over 30% success rate improvement.
Contribution
The paper introduces DNAct, integrating neural rendering pre-training with diffusion training for multi-modality learning in 3D action spaces, advancing robotic multi-task policy generalization.
Findings
Over 30% improvement in success rate over SOTA NeRF-based methods
Effective distillation of 2D semantics into 3D space using foundation models
Enhanced robustness and generalization in multi-task robotic manipulation
Abstract
This paper presents DNAct, a language-conditioned multi-task policy framework that integrates neural rendering pre-training and diffusion training to enforce multi-modality learning in action sequence spaces. To learn a generalizable multi-task policy with few demonstrations, the pre-training phase of DNAct leverages neural rendering to distill 2D semantic features from foundation models such as Stable Diffusion to a 3D space, which provides a comprehensive semantic understanding regarding the scene. Consequently, it allows various applications to challenging robotic tasks requiring rich 3D semantics and accurate geometry. Furthermore, we introduce a novel approach utilizing diffusion training to learn a vision and language feature that encapsulates the inherent multi-modality in the multi-task demonstrations. By reconstructing the action sequences from different tasks via the diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust
MethodsDiffusion
