UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation

Yihe Tang; Wenlong Huang; Yingke Wang; Chengshu Li; Roy Yuan; Ruohan Zhang; Jiajun Wu; Li Fei-Fei

arXiv:2506.09284·cs.RO·August 27, 2025

UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation

Yihe Tang, Wenlong Huang, Yingke Wang, Chengshu Li, Roy Yuan, Ruohan Zhang, Jiajun Wu, Li Fei-Fei

PDF

Open Access

TL;DR

UAD introduces an unsupervised method to distill detailed object affordance knowledge from foundation models, enabling robots to generalize manipulation skills in unstructured environments with minimal supervision.

Contribution

The paper presents UAD, a novel unsupervised approach that leverages foundation models to automatically annotate data and train a generalizable affordance model without manual labels.

Findings

01

UAD achieves strong generalization to real-world robotic scenes.

02

The method enables imitation learning with as few as 10 demonstrations.

03

UAD outperforms existing methods in unseen object and task generalization.

Abstract

Understanding fine-grained object affordances is imperative for robots to manipulate objects in unstructured environments given open-ended task instructions. However, existing methods of visual affordance predictions often rely on manually annotated data or conditions only on a predefined set of tasks. We introduce UAD (Unsupervised Affordance Distillation), a method for distilling affordance knowledge from foundation models into a task-conditioned affordance model without any manual annotations. By leveraging the complementary strengths of large vision models and vision-language models, UAD automatically annotates a large-scale dataset with detailed $<$ instruction, visual affordance $>$ pairs. Training only a lightweight task-conditioned decoder atop frozen features, UAD exhibits notable generalization to in-the-wild robotic scenes and to various human activities, despite only being…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Advanced Vision and Imaging

MethodsSparse Evolutionary Training