GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy
Yixuan Wang, Guang Yin, Binghao Huang, Tarik Kelestemur, Jiuguang, Wang, Yunzhu Li

TL;DR
This paper introduces GenDP, a framework that enhances diffusion-based robotic policies by incorporating 3D semantic fields derived from multi-view RGBD data, significantly improving generalization to unseen objects and layouts.
Contribution
The paper presents a novel method that explicitly encodes geometry and semantics into diffusion policies using 3D descriptor fields, enabling better generalization across object categories.
Findings
Success rate on unseen instances increased from 20% to 93%.
Method effectively resolves geometric ambiguities and captures subtle details.
Demonstrates strong generalization across multiple object categories.
Abstract
Diffusion-based policies have shown remarkable capability in executing complex robotic manipulation tasks but lack explicit characterization of geometry and semantics, which often limits their ability to generalize to unseen objects and layouts. To enhance the generalization capabilities of Diffusion Policy, we introduce a novel framework that incorporates explicit spatial and semantic information via 3D semantic fields. We generate 3D descriptor fields from multi-view RGBD observations with large foundational vision models, then compare these descriptor fields against reference descriptors to obtain semantic fields. The proposed method explicitly considers geometry and semantics, enabling strong generalization capabilities in tasks requiring category-level generalization, resolving geometric ambiguities, and attention to subtle geometric details. We evaluate our method across eight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
