LiDAR-Anchored Collaborative Distillation for Robust 2D Representations
Wonjun Jo, Hyunwoo Ha, Kim Ji-Yeon, Hawook Jeong, Tae-Hyun Oh

TL;DR
This paper introduces a collaborative distillation method that uses LiDAR data as self-supervision to enhance the robustness of 2D image encoders in adverse weather and noisy conditions, while maintaining their original performance.
Contribution
It presents a novel self-supervised learning approach leveraging 3D LiDAR for improving 2D encoder robustness and 3D awareness in challenging environments.
Findings
Outperforms existing methods in various downstream tasks.
Demonstrates strong generalization across diverse conditions.
Enhances 3D awareness from LiDAR data.
Abstract
As deep learning continues to advance, self-supervised learning has made considerable strides. It allows 2D image encoders to extract useful features for various downstream tasks, including those related to vision-based systems. Nevertheless, pre-trained 2D image encoders fall short in conducting the task under noisy and adverse weather conditions beyond clear daytime scenes, which require for robust visual perception. To address these issues, we propose a novel self-supervised approach, \textbf{Collaborative Distillation}, which leverages 3D LiDAR as self-supervision to improve robustness to noisy and adverse weather conditions in 2D image encoders while retaining their original capabilities. Our method outperforms competing methods in various downstream tasks across diverse conditions and exhibits strong generalization ability. In addition, our method also improves 3D awareness…
Peer Reviews
Decision·Submitted to ICLR 2026
**Significance** - This work deals with an important and highly practical problem for real-world applications like autonomous driving: the reliability of perception models under challenging environmental conditions. - This is definitely an area that is worth studying further **Clarity** - I find the paper well written with multiple visualizations, qualitative visualizations, and detailed implementation information. - The paper is in general easy to follow and the reasoning of the authors is cl
**Not completely valid assumptions** - The authors argue that 3D Lidar representations are highly robust to adverse weather conditions compared to 2D image representations. While this is true for night time conditions, Lidar is quite brittle under rain, fog and snow [a], [b], [c]. - Lidar does shine however in precise 3D information and geometry which might be in fact here one of the main contributing factors to performance boosts. - The authors assume that the LVD-142M dataset upon which DINOv2
The idea is simple and straightforward to understand. The figure is easy to understand, and the paper is easy to read., The work leverages complementary sensor properties, and this is considered to be a viable way to solve the nuisance variables.
The selection of the Dinov2 is questionable. The evaluation dataset is limited to the nuScenes dataset. The t-SNE plot (Fig. 3) is not convincing. The Waymo dataset is considered another set to test out, which has adverse weather conditions as well.
(+) The manuscript addresses an important and realistic gap: the lack of robustness of vision foundation models (e.g., DINOv2) under adverse conditions. The motivation for using LiDAR as a more stable and weather-invariant modality for self-supervision is clearly justified and timely. (+) Extensive experiments on nuScenes, nuImages, and some other benchmarks (KITTI, NYUd, ADE20k, Cityscapes) confirm that CD consistently improves robustness and generalization. The use of both few-shot and full-l
(-) While the framework is well-motivated and cleanly executed, it primarily combines known techniques — cross-modal distillation, bidirectional matching, and stop-gradient supervision — into a new use case. The innovation lies more in formulation and application (LiDAR as a teacher for robustness) than in new architecture or loss design. (-) Lack of discussions with several closely related works. There is a line of 2D-3D cross-modal knowledge transfer work, which, in this manuscript, was omitt
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
