3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
Hengshuo Chu, Xiang Deng, Qi Lv, Xiaoyang Chen, Yinchuan, Li, Jianye Hao, Liqiang Nie

TL;DR
This paper introduces 3D-AffordanceLLM, a novel framework that leverages large language models for open-vocabulary affordance detection in 3D scenes, enabling natural language reasoning and improved generalization.
Contribution
It reformulates 3D affordance detection into an instruction reasoning task and proposes a multi-stage training strategy with a new pre-training task, enhancing open-world reasoning capabilities.
Findings
Achieves approximately 8% improvement in mIoU on open-vocabulary tasks.
Introduces a new IRAS task for natural language-based affordance segmentation.
Develops a multi-stage training approach with ROPS pre-training.
Abstract
3D Affordance detection is a challenging problem with broad applications on various robotic tasks. Existing methods typically formulate the detection paradigm as a label-based semantic segmentation task. This paradigm relies on predefined labels and lacks the ability to comprehend complex natural language, resulting in limited generalization in open-world scene. To address these limitations, we reformulate the traditional affordance detection paradigm into \textit{Instruction Reasoning Affordance Segmentation} (IRAS) task. This task is designed to output a affordance mask region given a query reasoning text, which avoids fixed categories of input labels. We accordingly propose the \textit{3D-AffordanceLLM} (3D-ADLLM), a framework designed for reasoning affordance detection in 3D open-scene. Specifically, 3D-ADLLM introduces large language models (LLMs) to 3D affordance perception with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
