Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

Dongqiang Gou; Xuming He

arXiv:2603.17647·cs.CV·March 19, 2026

Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

Dongqiang Gou, Xuming He

PDF

Open Access

TL;DR

This paper introduces a two-stage cross-modal framework that improves open-vocabulary 3D affordance grounding by enhancing semantic and geometric representations, enabling better part-level alignment and generalization.

Contribution

It proposes a novel two-stage approach with part-aware instructions, Affordance Prototype Aggregation, and Intra-Object Relational Modeling for improved 3D affordance grounding.

Findings

01

Outperforms existing methods on multiple benchmarks

02

Enhances semantic consistency with large language models

03

Improves geometric alignment and part-level differentiation

Abstract

Grounding natural language questions to functionally relevant regions in 3D objects -- termed language-driven 3D affordance grounding -- is essential for embodied intelligence and human-AI interaction. Existing methods, while progressing from label-based to language-driven approaches, still face challenges in open-vocabulary generalization, fine-grained geometric alignment, and part-level semantic consistency. To address these issues, we propose a novel two-stage cross-modal framework that enhances both semantic and geometric representations for open-vocabulary 3D affordance grounding. In the first stage, large language models generate part-aware instructions to recover missing semantics, enabling the model to link semantically similar affordances. In the second stage, we introduce two key components: Affordance Prototype Aggregation (APA), which captures cross-object geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Human Motion and Animation