UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
Shengchao Zhou, Weizhou Liu, Chen Hu, Shuchang Zhou, and Chao Ma

TL;DR
UniDistill is a universal framework that enhances single-modality 3D object detectors by distilling knowledge from multi-modality detectors in Bird's-Eye View, improving accuracy without extra inference costs.
Contribution
It introduces a cross-modality knowledge distillation method in BEV for 3D detection, supporting various modality combinations and filtering background noise.
Findings
Improves mAP and NDS of student detectors by 2.0% to 3.2%.
Supports multiple distillation paths including LiDAR-to-camera and fusion-to-LiDAR.
Effectively filters background and balances object sizes during distillation.
Abstract
In the field of 3D object detection for autonomous driving, the sensor portfolio including multi-modality and single-modality is diverse and complex. Since the multi-modal methods have system complexity while the accuracy of single-modal ones is relatively low, how to make a tradeoff between them is difficult. In this work, we propose a universal cross-modality knowledge distillation framework (UniDistill) to improve the performance of single-modality detectors. Specifically, during training, UniDistill projects the features of both the teacher and the student detector into Bird's-Eye-View (BEV), which is a friendly representation for different modalities. Then, three distillation losses are calculated to sparsely align the foreground features, helping the student learn from the teacher without introducing additional cost during inference. Taking advantage of the similar detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Robotics and Sensor-Based Localization
MethodsKnowledge Distillation · ALIGN
