Pic@Point: Cross-Modal Learning by Local and Global Point-Picture   Correspondence

Vencia Herzog; Stefan Suwelack

arXiv:2410.09519·cs.CV·October 15, 2024

Pic@Point: Cross-Modal Learning by Local and Global Point-Picture Correspondence

Vencia Herzog, Stefan Suwelack

PDF

Open Access

TL;DR

Pic@Point introduces a novel contrastive learning method leveraging 2D-3D correspondences to enhance 3D point cloud representations, outperforming existing pre-training techniques on multiple benchmarks.

Contribution

It presents a new cross-modal contrastive learning approach that uses image cues to improve 3D point cloud pre-training, addressing limitations of previous methods.

Findings

01

Outperforms state-of-the-art pre-training methods on 3D benchmarks.

02

Effectively leverages image semantics for 3D point cloud learning.

03

Provides a lightweight yet powerful pre-training approach.

Abstract

Self-supervised pre-training has achieved remarkable success in NLP and 2D vision. However, these advances have yet to translate to 3D data. Techniques like masked reconstruction face inherent challenges on unstructured point clouds, while many contrastive learning tasks lack in complexity and informative value. In this paper, we present Pic@Point, an effective contrastive learning method based on structural 2D-3D correspondences. We leverage image cues rich in semantic and contextual knowledge to provide a guiding signal for point cloud representations at various abstraction levels. Our lightweight approach outperforms state-of-the-art pre-training methods on several 3D benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques

MethodsContrastive Learning