MambaST: A Plug-and-Play Cross-Spectral Spatial-Temporal Fuser for   Efficient Pedestrian Detection

Xiangbo Gao; Asiegbu Miracle Kanu-Asiegbu; and Xiaoxiao Du

arXiv:2408.01037·cs.CV·August 5, 2024

MambaST: A Plug-and-Play Cross-Spectral Spatial-Temporal Fuser for Efficient Pedestrian Detection

Xiangbo Gao, Asiegbu Miracle Kanu-Asiegbu, and Xiaoxiao Du

PDF

Open Access 1 Repo

TL;DR

MambaST introduces a plug-and-play cross-spectral spatial-temporal fusion pipeline that enhances pedestrian detection in low-light conditions, combining thermal and RGB data efficiently for real-time autonomous driving applications.

Contribution

It presents a novel Multi-head Hierarchical Patching and Aggregation (MHHPA) structure for effective cross-spectral fusion, improving detection accuracy and efficiency over existing methods.

Findings

01

Outperforms existing models in small-scale pedestrian detection.

02

Achieves superior accuracy in low-light conditions.

03

Offers an efficient alternative to Transformer-based models.

Abstract

This paper proposes MambaST, a plug-and-play cross-spectral spatial-temporal fusion pipeline for efficient pedestrian detection. Several challenges exist for pedestrian detection in autonomous driving applications. First, it is difficult to perform accurate detection using RGB cameras under dark or low-light conditions. Cross-spectral systems must be developed to integrate complementary information from multiple sensor modalities, such as thermal and visible cameras, to improve the robustness of the detections. Second, pedestrian detection models are latency-sensitive. Efficient and easy-to-scale detection models with fewer parameters are highly desirable for real-time applications such as autonomous driving. Third, pedestrian video data provides spatial-temporal correlations of pedestrian movement. It is beneficial to incorporate temporal as well as spatial information to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiangbogaobarry/mambast
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Automated Road and Building Extraction · Autonomous Vehicle Technology and Safety

MethodsLinear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections