Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
Xianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao

TL;DR
This paper introduces a novel framework for 3D affordance grounding that leverages multiple human-object interaction images to learn invariant affordance knowledge, improving generalization and accuracy in identifying functional regions on 3D objects.
Contribution
The paper proposes the MIFAG framework, which extracts and integrates invariant affordance knowledge from multiple images, addressing geometric inconsistencies and enhancing 3D affordance grounding.
Findings
Outperforms existing methods on the MIPA benchmark.
Effectively learns invariant affordance features from multiple images.
Improves generalization in 3D affordance prediction.
Abstract
3D Object Affordance Grounding aims to predict the functional regions on a 3D object and has laid the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the Multi-Image Guided Invariant-Feature-Aware 3D Affordance Grounding (MIFAG) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobot Manipulation and Learning · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition
