LidarCLIP or: How I Learned to Talk to Point Clouds

Georg Hess; Adam Tonderski; Christoffer Petersson; Kalle {\AA}str\"om,; Lennart Svensson

arXiv:2212.06858·cs.CV·May 3, 2023·1 cites

LidarCLIP or: How I Learned to Talk to Point Clouds

Georg Hess, Adam Tonderski, Christoffer Petersson, Kalle {\AA}str\"om,, Lennart Svensson

PDF

Open Access 1 Repo 1 Video

TL;DR

LidarCLIP introduces a novel method to relate lidar point clouds to text and images by mapping them into a shared CLIP embedding space, enabling zero-shot classification, retrieval, and cross-modal applications in autonomous driving.

Contribution

The paper presents LidarCLIP, the first model to connect lidar data with CLIP embeddings, facilitating cross-modal retrieval and zero-shot tasks without additional training.

Findings

01

LidarCLIP achieves comparable lidar and image retrieval performance.

02

Combining lidar and image features improves detection in challenging scenarios.

03

LidarCLIP significantly outperforms previous CLIP-based methods for point cloud classification.

Abstract

Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL-E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of LidarCLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

atonderski/lidarclip
pytorchOfficial

Videos

LidarCLIP or: How I Learned To Talk to Point Clouds· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training · Diffusion