Selective Transfer Learning of Cross-Modality Distillation for Monocular 3D Object Detection
Rui Ding, Meng Yang, and Nanning Zheng

TL;DR
This paper introduces MonoSTL, a selective transfer learning method that improves cross-modality distillation for monocular 3D object detection by addressing modality gaps and feature overfitting, leading to state-of-the-art results.
Contribution
The paper proposes a novel selective distillation approach with DASFD and DASRD modules to enhance depth transfer from LiDAR to image networks, overcoming negative transfer issues.
Findings
Significant accuracy improvements on KITTI and NuScenes datasets.
Effective mitigation of negative transfer caused by modality gap.
Compatibility with various CNN and DETR-based models.
Abstract
Monocular 3D object detection is a promising yet ill-posed task for autonomous vehicles due to the lack of accurate depth information. Cross-modality knowledge distillation could effectively transfer depth information from LiDAR to image-based network. However, modality gap between image and LiDAR seriously limits its accuracy. In this paper, we systematically investigate the negative transfer problem induced by modality gap in cross-modality distillation for the first time, including not only the architecture inconsistency issue but more importantly the feature overfitting issue. We propose a selective learning approach named MonoSTL to overcome these issues, which encourages positive transfer of depth information from LiDAR while alleviates the negative transfer on image-based network. On the one hand, we utilize similar architectures to ensure spatial alignment of features between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
