3D Concept Grounding on Neural Fields
Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan

TL;DR
This paper introduces a novel approach for 3D concept grounding using neural fields, enabling flexible, continuous 3D segmentation and learning from question-answer supervision, outperforming existing models.
Contribution
It leverages neural fields for differentiable 3D concept grounding, allowing joint segmentation and learning directly from language and question-answer data.
Findings
Outperforms existing segmentation models on semantic and instance tasks.
Achieves superior results on 3D visual reasoning benchmarks.
Generalizes well to unseen categories and real scans.
Abstract
In this paper, we address the challenging problem of 3D concept grounding (i.e. segmenting and learning visual concepts) by looking at RGBD images and reasoning about paired questions and answers. Existing visual reasoning approaches typically utilize supervised methods to extract 2D segmentation masks on which concepts are grounded. In contrast, humans are capable of grounding concepts on the underlying 3D representation of images. However, traditionally inferred 3D representations (e.g., point clouds, voxelgrids, and meshes) cannot capture continuous 3D features flexibly, thus making it challenging to ground concepts to 3D regions based on the language description of the object being referred to. To address both issues, we propose to leverage the continuous, differentiable nature of neural fields to segment and learn concepts. Specifically, each 3D coordinate in a scene is represented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
MethodsAttentive Walk-Aggregating Graph Neural Network
