GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
Yawen Shao, Wei Zhai, Yuhang Yang, Hongchen Luo, Yang Cao, Zheng-Jun, Zha

TL;DR
GREAT is a novel framework that enhances 3D object affordance grounding by leveraging invariant geometries and analogical reasoning, significantly improving understanding of action possibilities on 3D objects.
Contribution
The paper introduces GREAT, a new method that combines geometric invariance and analogical reasoning for open-vocabulary 3D affordance grounding, along with the PIADv2 dataset.
Findings
GREAT outperforms existing methods in 3D affordance grounding tasks.
The PIADv2 dataset is the largest of its kind for this task.
Extensive experiments validate the effectiveness of GREAT.
Abstract
Open-Vocabulary 3D object affordance grounding aims to anticipate ``action possibilities'' regions on 3D objects with arbitrary instructions, which is crucial for robots to generically perceive real scenarios and respond to operational changes. Existing methods focus on combining images or languages that depict interactions with 3D geometries to introduce external interaction priors. However, they are still vulnerable to a limited semantic space by failing to leverage implied invariant geometries and potential interaction intentions. Normally, humans address complex tasks through multi-step reasoning and respond to diverse situations by leveraging associative and analogical thinking. In light of this, we propose GREAT (GeometRy-intEntion collAboraTive inference) for Open-Vocabulary 3D Object Affordance Grounding, a novel framework that mines the object invariant geometry attributes and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsFocus
