Fine-grained Image-to-LiDAR Contrastive Distillation with Visual   Foundation Models

Yifan Zhang; Junhui Hou

arXiv:2405.14271·cs.CV·January 3, 2025

Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models

Yifan Zhang, Junhui Hou

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel contrastive distillation method leveraging Visual Foundation Models and structured feature spaces to improve 3D representation learning from images and LiDAR data, addressing semantic inconsistency issues.

Contribution

It proposes using VFMs for semantic labeling, von Mises-Fisher distributions for feature structuring, and adaptive sampling to enhance image-to-LiDAR contrastive learning.

Findings

01

Outperforms existing methods in downstream tasks

02

Mitigates semantic feature conflicts in contrastive learning

03

Provides a scalable framework for 3D representation enhancement

Abstract

Contrastive image-to-LiDAR knowledge transfer, commonly used for learning 3D representations with synchronized images and point clouds, often faces a self-conflict dilemma. This issue arises as contrastive losses unintentionally dissociate features of unmatched points and pixels that share semantic labels, compromising the integrity of learned representations. To overcome this, we harness Visual Foundation Models (VFMs), which have revolutionized the acquisition of pixel-level semantics, to enhance 3D representation learning. Specifically, we utilize off-the-shelf VFMs to generate semantic labels for weakly-supervised pixel-to-point contrastive distillation. Additionally, we employ von Mises-Fisher distributions to structure the feature space, ensuring semantic embeddings within the same class remain consistent across varying inputs. Furthermore, we adapt sampling probabilities of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eaphan/olivine
pytorchOfficial

Videos

Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models· slideslive

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection