DNAct: Diffusion Guided Multi-Task 3D Policy Learning

Ge Yan; Yueh-Hua Wu; Xiaolong Wang

arXiv:2403.04115·cs.RO·March 11, 2024·1 cites

DNAct: Diffusion Guided Multi-Task 3D Policy Learning

Ge Yan, Yueh-Hua Wu, Xiaolong Wang

PDF

Open Access

TL;DR

DNAct is a multi-task 3D policy learning framework that combines neural rendering and diffusion training to enhance semantic understanding and robustness in robotic tasks, achieving over 30% success rate improvement.

Contribution

The paper introduces DNAct, integrating neural rendering pre-training with diffusion training for multi-modality learning in 3D action spaces, advancing robotic multi-task policy generalization.

Findings

01

Over 30% improvement in success rate over SOTA NeRF-based methods

02

Effective distillation of 2D semantics into 3D space using foundation models

03

Enhanced robustness and generalization in multi-task robotic manipulation

Abstract

This paper presents DNAct, a language-conditioned multi-task policy framework that integrates neural rendering pre-training and diffusion training to enforce multi-modality learning in action sequence spaces. To learn a generalizable multi-task policy with few demonstrations, the pre-training phase of DNAct leverages neural rendering to distill 2D semantic features from foundation models such as Stable Diffusion to a 3D space, which provides a comprehensive semantic understanding regarding the scene. Consequently, it allows various applications to challenging robotic tasks requiring rich 3D semantics and accurate geometry. Furthermore, we introduce a novel approach utilizing diffusion training to learn a vision and language feature that encapsulates the inherent multi-modality in the multi-task demonstrations. By reconstructing the action sequences from different tasks via the diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust

MethodsDiffusion