GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D   Object Affordance Grounding

Yawen Shao; Wei Zhai; Yuhang Yang; Hongchen Luo; Yang Cao; Zheng-Jun; Zha

arXiv:2411.19626·cs.CV·April 1, 2025

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

Yawen Shao, Wei Zhai, Yuhang Yang, Hongchen Luo, Yang Cao, Zheng-Jun, Zha

PDF

Open Access 1 Repo

TL;DR

GREAT is a novel framework that enhances 3D object affordance grounding by leveraging invariant geometries and analogical reasoning, significantly improving understanding of action possibilities on 3D objects.

Contribution

The paper introduces GREAT, a new method that combines geometric invariance and analogical reasoning for open-vocabulary 3D affordance grounding, along with the PIADv2 dataset.

Findings

01

GREAT outperforms existing methods in 3D affordance grounding tasks.

02

The PIADv2 dataset is the largest of its kind for this task.

03

Extensive experiments validate the effectiveness of GREAT.

Abstract

Open-Vocabulary 3D object affordance grounding aims to anticipate ``action possibilities'' regions on 3D objects with arbitrary instructions, which is crucial for robots to generically perceive real scenarios and respond to operational changes. Existing methods focus on combining images or languages that depict interactions with 3D geometries to introduce external interaction priors. However, they are still vulnerable to a limited semantic space by failing to leverage implied invariant geometries and potential interaction intentions. Normally, humans address complex tasks through multi-step reasoning and respond to diverse situations by leveraging associative and analogical thinking. In light of this, we propose GREAT (GeometRy-intEntion collAboraTive inference) for Open-Vocabulary 3D Object Affordance Grounding, a novel framework that mines the object invariant geometry attributes and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yawen-shao/great_code
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsFocus