Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic   Assisted Pseudo-labeling

Xu Wang; Yifan Li; Qiudan Zhang; Wenhui Wu; Mark Junjie Li; Jianmin; Jinag

arXiv:2404.02527·cs.CV·April 4, 2024·1 cites

Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling

Xu Wang, Yifan Li, Qiudan Zhang, Wenhui Wu, Mark Junjie Li, Jianmin, Jinag

PDF

Open Access 1 Repo

TL;DR

This paper introduces 3D-VLAP, a weakly-supervised method for 3D scene graph generation that leverages visual-linguistic models and pseudo-labeling to reduce annotation effort while maintaining high performance.

Contribution

The paper proposes a novel weakly-supervised approach using visual-linguistic models and pseudo-labeling for 3D scene graph generation, reducing reliance on extensive annotations.

Findings

01

Achieves comparable results to fully supervised methods

02

Significantly reduces data annotation requirements

03

Uses cross-modal alignment for pseudo-label generation

Abstract

Learning to build 3D scene graphs is essential for real-world perception in a structured and rich fashion. However, previous 3D scene graph generation methods utilize a fully supervised learning manner and require a large amount of entity-level annotation data of objects and relations, which is extremely resource-consuming and tedious to obtain. To tackle this problem, we propose 3D-VLAP, a weakly-supervised 3D scene graph generation method via Visual-Linguistic Assisted Pseudo-labeling. Specifically, our 3D-VLAP exploits the superior ability of current large-scale visual-linguistic models to align the semantics between texts and 2D images, as well as the naturally existing correspondences between 2D images and 3D point clouds, and thus implicitly constructs correspondences between texts and 3D point clouds. First, we establish the positional correspondence from 3D point clouds to 2D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liyifan-123/3dvlap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Video Analysis and Summarization · Computer Graphics and Visualization Techniques

MethodsGraph Neural Network · ALIGN