UniAff: A Unified Representation of Affordances for Tool Usage and   Articulation with Vision-Language Models

Qiaojun Yu; Siyuan Huang; Xibin Yuan; Zhengkai Jiang; Ce Hao; Xin Li,; Haonan Chang; Junbo Wang; Liu Liu; Hongsheng Li; Peng Gao; Cewu Lu

arXiv:2409.20551·cs.RO·February 10, 2025

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Qiaojun Yu, Siyuan Huang, Xibin Yuan, Zhengkai Jiang, Ce Hao, Xin Li,, Haonan Chang, Junbo Wang, Liu Liu, Hongsheng Li, Peng Gao, Cewu Lu

PDF

Open Access 1 Models

TL;DR

UniAff introduces a unified framework combining 3D object manipulation and task understanding, leveraging multi-modal language models and a new dataset to enhance robotic manipulation of tools and articulated objects.

Contribution

It presents a comprehensive paradigm and dataset for unified robotic manipulation, integrating affordance recognition and 3D motion reasoning with vision-language models.

Findings

01

Significantly improves generalization in robotic manipulation tasks.

02

Demonstrates effectiveness in both simulation and real-world environments.

03

Provides a new dataset and baseline for future research.

Abstract

Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation. Specifically, we constructed a dataset labeled with manipulation-related key attributes, comprising 900 articulated objects from 19 categories and 600 tools from 12 categories. Furthermore, we leverage MLLMs to infer object-centric representations for manipulation tasks, including affordance recognition and reasoning about 3D motion constraints. Comprehensive experiments in both simulation and real-world settings indicate that UniAff significantly improves the generalization of robotic manipulation for tools and articulated objects. We hope that UniAff will serve as a general…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
SiyuanH/UniAff-13B
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Multimodal Machine Learning Applications