SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation
Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin, Zhu, Leonidas Guibas

TL;DR
SparseDFF leverages large 2D vision models to create view-consistent 3D semantic features from sparse RGBD images, enabling one-shot dexterous manipulation learning for robots across diverse objects and scenes.
Contribution
Introduces SparseDFF, a novel 3D feature distillation framework that uses large vision models and contrastive learning for efficient one-shot manipulation in robotics.
Findings
Effective in manipulating rigid objects
Generalizes well to deformable objects
Works with real-world dexterous hands
Abstract
Humans demonstrate remarkable skill in transferring manipulation abilities across objects of varying shapes, poses, and appearances, a capability rooted in their understanding of semantic correspondences between different instances. To equip robots with a similar high-level comprehension, we present SparseDFF, a novel DFF for 3D scenes utilizing large 2D vision models to extract semantic features from sparse RGBD images, a domain where research is limited despite its relevance to many tasks with fixed-camera setups. SparseDFF generates view-consistent 3D DFFs, enabling efficient one-shot learning of dexterous manipulations by mapping image features to a 3D point cloud. Central to SparseDFF is a feature refinement network, optimized with a contrastive loss between views and a point-pruning mechanism for feature continuity. This facilitates the minimization of feature discrepancies w.r.t.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Robot Manipulation and Learning · Image Processing Techniques and Applications
