Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
Toan Nguyen, Minh Nhat Vu, Baoru Huang, Tuan Van Vo, Vy Truong, Ngan, Le, Thieu Vo, Bac Le, Anh Nguyen

TL;DR
This paper introduces a novel language-conditioned method for joint affordance detection and pose estimation in 3D point clouds, enabling robots to recognize and manipulate objects with any affordance label in real-world scenarios.
Contribution
It proposes an open-vocabulary affordance detection and pose generation framework using a language-guided diffusion model, along with a new dataset for language-driven affordance-pose learning.
Findings
Effective on a wide range of open-vocabulary affordances
Outperforms baseline methods significantly
Demonstrates practical utility in robotic applications
Abstract
Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-world environments. In this paper, we propose a new method for language-conditioned affordance-pose joint learning in 3D point clouds. Given a 3D point cloud object, our method detects the affordance region and generates appropriate 6-DoF poses for any unconstrained affordance label. Our method consists of an open-vocabulary affordance detection branch and a language-guided diffusion model that generates 6-DoF poses based on the affordance text. We also introduce a new high-quality dataset for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Mechanisms and Dynamics · Human Pose and Action Recognition
