Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment
Dongqiang Gou, Xuming He

TL;DR
This paper introduces a two-stage cross-modal framework that improves open-vocabulary 3D affordance grounding by enhancing semantic and geometric representations, enabling better part-level alignment and generalization.
Contribution
It proposes a novel two-stage approach with part-aware instructions, Affordance Prototype Aggregation, and Intra-Object Relational Modeling for improved 3D affordance grounding.
Findings
Outperforms existing methods on multiple benchmarks
Enhances semantic consistency with large language models
Improves geometric alignment and part-level differentiation
Abstract
Grounding natural language questions to functionally relevant regions in 3D objects -- termed language-driven 3D affordance grounding -- is essential for embodied intelligence and human-AI interaction. Existing methods, while progressing from label-based to language-driven approaches, still face challenges in open-vocabulary generalization, fine-grained geometric alignment, and part-level semantic consistency. To address these issues, we propose a novel two-stage cross-modal framework that enhances both semantic and geometric representations for open-vocabulary 3D affordance grounding. In the first stage, large language models generate part-aware instructions to recover missing semantics, enabling the model to link semantically similar affordances. In the second stage, we introduce two key components: Affordance Prototype Aggregation (APA), which captures cross-object geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Human Motion and Animation
