Language-Assisted 3D Feature Learning for Semantic Scene Understanding

Junbo Zhang; Guofan Fan; Guanghan Wang; Zhengyuan Su; Kaisheng Ma; Li; Yi

arXiv:2211.14091·cs.CV·December 13, 2022·1 cites

Language-Assisted 3D Feature Learning for Semantic Scene Understanding

Junbo Zhang, Guofan Fan, Guanghan Wang, Zhengyuan Su, Kaisheng Ma, Li, Yi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel language-assisted training approach for 3D scene understanding that leverages textual descriptions to improve 3D feature learning, especially in label-scarce scenarios, and enhances multimodal task performance.

Contribution

It proposes a method to incorporate textual scene descriptions into 3D feature learning via auxiliary tasks, improving semantic understanding and multimodal alignment.

Findings

01

Enhanced 3D semantic scene understanding performance.

02

Better alignment between 3D features and language features.

03

Effective in label-deficient regimes.

Abstract

Learning descriptive 3D features is crucial for understanding 3D scenes with diverse objects and complex structures. However, it is usually unknown whether important geometric attributes and scene context obtain enough emphasis in an end-to-end trained 3D scene understanding network. To guide 3D feature learning toward important geometric attributes and scene context, we explore the help of textual scene descriptions. Given some free-form descriptions paired with 3D scenes, we extract the knowledge regarding the object relationships and object attributes. We then inject the knowledge to 3D feature learning through three classification-based auxiliary tasks. This language-assisted training can be combined with modern object detection and instance segmentation methods to promote 3D semantic scene understanding, especially in a label-deficient regime. Moreover, the 3D feature learned with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asterisci/language-assisted-3d
noneOfficial

Videos

Language-Assisted 3D Feature Learning for Semantic Scene Understanding· underline

Taxonomy

Topics3D Surveying and Cultural Heritage · Human Pose and Action Recognition · Hand Gesture Recognition Systems