Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection
Lu Zhang, Xiangyu Zhu, Xiangyu Chen, Xu Yang, Zhen Lei, Zhiyong Liu

TL;DR
This paper introduces AR-CNN, a novel end-to-end framework for multispectral pedestrian detection that effectively handles weakly aligned data through region feature alignment, feature re-weighting, and robustness strategies, improving detection accuracy under challenging conditions.
Contribution
The paper proposes a new AR-CNN model with RFA, feature re-weighting, and RoI jitter strategies, and provides a manually relabeled KAIST-Paired dataset for better multispectral alignment.
Findings
Improved detection accuracy on multispectral datasets.
Enhanced robustness to alignment shifts and device variations.
Effective feature fusion and alignment strategies demonstrated.
Abstract
Multispectral pedestrian detection has shown great advantages under poor illumination conditions, since the thermal modality provides complementary information for the color image. However, real multispectral data suffers from the position shift problem, i.e. the color-thermal image pairs are not strictly aligned, making one object has different positions in different modalities. In deep learning based methods, this problem makes it difficult to fuse the feature maps from both modalities and puzzles the CNN training. In this paper, we propose a novel Aligned Region CNN (AR-CNN) to handle the weakly aligned multispectral data in an end-to-end way. Firstly, we design a Region Feature Alignment (RFA) module to capture the position shift and adaptively align the region features of the two modalities. Secondly, we present a new multimodal fusion method, which performs feature re-weighting to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Remote-Sensing Image Classification
