TL;DR
This paper introduces AMFD, a novel distillation framework that leverages original multi-modal features for improved multispectral pedestrian detection, reducing inference time and enhancing detection accuracy.
Contribution
The proposed AMFD framework fully utilizes teacher network features with a new Modal Extraction Alignment module, enabling effective fusion without extra fusion modules.
Findings
Outperforms state-of-the-art methods in reducing Miss Rate.
Improves mean Average Precision on multiple datasets.
Reduces inference time compared to double-stream networks.
Abstract
Multispectral pedestrian detection has been shown to be effective in improving performance within complex illumination scenarios. However, prevalent double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data, leading to nearly double the inference time compared to single-stream networks utilizing only one feature extraction branch. This increased inference time has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems. To address this limitation, various knowledge distillation methods have been proposed. However, traditional distillation methods focus only on the fusion features and ignore the large amount of information in the original multi-modal features, thereby restricting the student network's performance. To tackle the challenge, we introduce the Adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
