D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable   Rearrangement

Yixuan Wang; Mingtong Zhang; Zhuoran Li; Tarik Kelestemur; Katherine; Driggs-Campbell; Jiajun Wu; Li Fei-Fei; Yunzhu Li

arXiv:2309.16118·cs.RO·October 18, 2024·1 cites

D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement

Yixuan Wang, Mingtong Zhang, Zhuoran Li, Tarik Kelestemur, Katherine, Driggs-Campbell, Jiajun Wu, Li Fei-Fei, Yunzhu Li

PDF

Open Access

TL;DR

D$^3$Fields introduces a dynamic, semantic 3D representation that fuses visual features for flexible, zero-shot robotic rearrangement, outperforming existing methods in real and simulated environments.

Contribution

The paper presents D$^3$Fields, a novel implicit 3D descriptor that captures dynamics and semantics, enabling zero-shot generalization in robotic rearrangement tasks.

Findings

01

Effective in zero-shot rearrangement tasks

02

Outperforms state-of-the-art implicit 3D representations

03

Demonstrates robustness in real-world and simulation environments

Abstract

Scene representation is a crucial design choice in robotic manipulation systems. An ideal representation is expected to be 3D, dynamic, and semantic to meet the demands of diverse manipulation tasks. However, previous works often lack all three properties simultaneously. In this work, we introduce D $^{3}$ Fields -- dynamic 3D descriptor fields. These fields are implicit 3D representations that take in 3D points and output semantic features and instance masks. They can also capture the dynamics of the underlying 3D environments. Specifically, we project arbitrary 3D points in the workspace onto multi-view 2D visual observations and interpolate features derived from visual foundational models. The resulting fused descriptor fields allow for flexible goal specifications using 2D images with varied contexts, styles, and instances. To evaluate the effectiveness of these descriptor fields, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Robot Manipulation and Learning · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention · Residual Connection · Dense Connections · Layer Normalization · Vision Transformer · self-DIstillation with NO labels