Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling
Xu Wang, Yifan Li, Qiudan Zhang, Wenhui Wu, Mark Junjie Li, Jianmin, Jinag

TL;DR
This paper introduces 3D-VLAP, a weakly-supervised method for 3D scene graph generation that leverages visual-linguistic models and pseudo-labeling to reduce annotation effort while maintaining high performance.
Contribution
The paper proposes a novel weakly-supervised approach using visual-linguistic models and pseudo-labeling for 3D scene graph generation, reducing reliance on extensive annotations.
Findings
Achieves comparable results to fully supervised methods
Significantly reduces data annotation requirements
Uses cross-modal alignment for pseudo-label generation
Abstract
Learning to build 3D scene graphs is essential for real-world perception in a structured and rich fashion. However, previous 3D scene graph generation methods utilize a fully supervised learning manner and require a large amount of entity-level annotation data of objects and relations, which is extremely resource-consuming and tedious to obtain. To tackle this problem, we propose 3D-VLAP, a weakly-supervised 3D scene graph generation method via Visual-Linguistic Assisted Pseudo-labeling. Specifically, our 3D-VLAP exploits the superior ability of current large-scale visual-linguistic models to align the semantics between texts and 2D images, as well as the naturally existing correspondences between 2D images and 3D point clouds, and thus implicitly constructs correspondences between texts and 3D point clouds. First, we establish the positional correspondence from 3D point clouds to 2D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Computer Graphics and Visualization Techniques
MethodsGraph Neural Network · ALIGN
