Joint Representation Learning for Text and 3D Point Cloud

Rui Huang; Xuran Pan; Henry Zheng; Haojun Jiang; Zhifeng Xie; Shiji; Song; Gao Huang

arXiv:2301.07584·cs.CV·January 19, 2023·1 cites

Joint Representation Learning for Text and 3D Point Cloud

Rui Huang, Xuran Pan, Henry Zheng, Haojun Jiang, Zhifeng Xie, Shiji, Song, Gao Huang

PDF

Open Access

TL;DR

This paper introduces Text4Point, a framework that leverages 2D images as a bridge to align 3D point cloud representations with language, improving performance on various 3D understanding tasks.

Contribution

The novel Text4Point framework effectively aligns 3D point clouds with text using image bridging and contrastive learning, addressing data scarcity and irregularity issues.

Findings

01

Improved performance on point cloud segmentation and detection tasks.

02

Effective alignment of 3D features with language embeddings.

03

Versatile framework applicable to multiple 3D tasks.

Abstract

Recent advancements in vision-language pre-training (e.g. CLIP) have shown that vision models can benefit from language supervision. While many models using language modality have achieved great success on 2D vision tasks, the joint representation learning of 3D point cloud with text remains under-explored due to the difficulty of 3D-Text data pair acquisition and the irregularity of 3D data structure. In this paper, we propose a novel Text4Point framework to construct language-guided 3D point cloud models. The key idea is utilizing 2D images as a bridge to connect the point cloud and the language modalities. The proposed Text4Point follows the pre-training and fine-tuning paradigm. During the pre-training stage, we establish the correspondence of images and point clouds based on the readily available RGB-D data and use contrastive learning to align the image and point cloud…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Advanced Neural Network Applications

MethodsContrastive Language-Image Pre-training · Contrastive Learning · ALIGN