When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

TL;DR
This paper introduces MMPedestron, a versatile multi-modal pedestrian detection model capable of processing various sensor inputs, and establishes a new large-scale benchmark dataset, EventPed, to evaluate multi-modal perception performance.
Contribution
The paper presents a unified multi-modal pedestrian detection model and a comprehensive benchmark dataset, enabling effective processing of diverse sensor modalities and advancing multi-modal perception research.
Findings
MMPedestron achieves state-of-the-art results on multiple benchmarks.
The model surpasses specialized models in multi-modal pedestrian detection.
EventPed dataset covers a wide range of sensor modalities for robust evaluation.
Abstract
Recent years have witnessed increasing research attention towards pedestrian detection by taking the advantages of different sensor modalities (e.g. RGB, IR, Depth, LiDAR and Event). However, designing a unified generalist model that can effectively process diverse sensor modalities remains a challenge. This paper introduces MMPedestron, a novel generalist model for multimodal perception. Unlike previous specialist models that only process one or a pair of specific modality inputs, MMPedestron is able to process multiple modal inputs and their dynamic combinations. The proposed approach comprises a unified encoder for modal representation and fusion and a general head for pedestrian detection. We introduce two extra learnable tokens, i.e. MAA and MAF, for adaptive multi-modal feature fusion. In addition, we construct the MMPD dataset, the first large-scale benchmark for multi-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Infrastructure Maintenance and Monitoring · Automated Road and Building Extraction
MethodsSoftmax · Attention Is All You Need
