The Devil is in the Details: Simple Remedies for Image-to-LiDAR Representation Learning
Wonjun Jo, Kwon Byung-Ki, Kim Ji-Yeon, Hawook Jeong, Kyungdon Joo, and, Tae-Hyun Oh

TL;DR
This paper reveals that simple design choices in image-to-LiDAR representation learning, such as coordinate system and data utilization, significantly impact performance, surpassing complex loss function innovations.
Contribution
It identifies overlooked design elements in spatial and temporal data handling and demonstrates simple fixes that greatly improve downstream 3D perception tasks.
Findings
16% improvement in 3D semantic segmentation on nuScenes
13% improvement in 3D object detection on KITTI
Simple design fixes outperform complex loss functions
Abstract
LiDAR is a crucial sensor in autonomous driving, commonly used alongside cameras. By exploiting this camera-LiDAR setup and recent advances in image representation learning, prior studies have shown the promising potential of image-to-LiDAR distillation. These prior arts focus on the designs of their own losses to effectively distill the pre-trained 2D image representations into a 3D model. However, the other parts of the designs have been surprisingly unexplored. We find that fundamental design elements, e.g., the LiDAR coordinate system, quantization according to the existing input interface, and data utilization, are more critical than developing loss functions, which have been overlooked in prior works. In this work, we show that simple fixes to these designs notably outperform existing methods by 16% in 3D semantic segmentation on the nuScenes dataset and 13% in 3D object detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Imaging and Analysis · Medical Image Segmentation Techniques
MethodsConvolution · Focus
