The Devil is in the Details: Simple Remedies for Image-to-LiDAR   Representation Learning

Wonjun Jo; Kwon Byung-Ki; Kim Ji-Yeon; Hawook Jeong; Kyungdon Joo; and; Tae-Hyun Oh

arXiv:2501.09485·cs.CV·January 17, 2025

The Devil is in the Details: Simple Remedies for Image-to-LiDAR Representation Learning

Wonjun Jo, Kwon Byung-Ki, Kim Ji-Yeon, Hawook Jeong, Kyungdon Joo, and, Tae-Hyun Oh

PDF

Open Access

TL;DR

This paper reveals that simple design choices in image-to-LiDAR representation learning, such as coordinate system and data utilization, significantly impact performance, surpassing complex loss function innovations.

Contribution

It identifies overlooked design elements in spatial and temporal data handling and demonstrates simple fixes that greatly improve downstream 3D perception tasks.

Findings

01

16% improvement in 3D semantic segmentation on nuScenes

02

13% improvement in 3D object detection on KITTI

03

Simple design fixes outperform complex loss functions

Abstract

LiDAR is a crucial sensor in autonomous driving, commonly used alongside cameras. By exploiting this camera-LiDAR setup and recent advances in image representation learning, prior studies have shown the promising potential of image-to-LiDAR distillation. These prior arts focus on the designs of their own losses to effectively distill the pre-trained 2D image representations into a 3D model. However, the other parts of the designs have been surprisingly unexplored. We find that fundamental design elements, e.g., the LiDAR coordinate system, quantization according to the existing input interface, and data utilization, are more critical than developing loss functions, which have been overlooked in prior works. In this work, we show that simple fixes to these designs notably outperform existing methods by 16% in 3D semantic segmentation on the nuScenes dataset and 13% in 3D object detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Imaging and Analysis · Medical Image Segmentation Techniques

MethodsConvolution · Focus