Multi-modal Crowd Counting via Modal Emulation
Chenhao Wang, Xiaopeng Hong, Zhiheng Ma, Yupeng Wei, Yabin Wang,, Xiaopeng Fan

TL;DR
This paper introduces a novel two-pass multi-modal crowd counting framework that employs modal emulation, attention mechanisms, and modality alignment to improve counting accuracy across RGB-Thermal and RGB-Depth datasets.
Contribution
It presents a new modal emulation-based framework with a two-pass approach, including cross-modal attention and alignment modules, to enhance multi-modal crowd counting performance.
Findings
Outperforms previous methods on RGB-Thermal datasets
Demonstrates effective modality alignment with modal consistency loss
Achieves superior accuracy in multi-modal crowd counting tasks
Abstract
Multi-modal crowd counting is a crucial task that uses multi-modal cues to estimate the number of people in crowded scenes. To overcome the gap between different modalities, we propose a modal emulation-based two-pass multi-modal crowd-counting framework that enables efficient modal emulation, alignment, and fusion. The framework consists of two key components: a \emph{multi-modal inference} pass and a \emph{cross-modal emulation} pass. The former utilizes a hybrid cross-modal attention module to extract global and local information and achieve efficient multi-modal fusion. The latter uses attention prompting to coordinate different modalities and enhance multi-modal alignment. We also introduce a modality alignment module that uses an efficient modal consistency loss to align the outputs of the two passes and bridge the semantic gap between modalities. Extensive experiments on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvacuation and Crowd Dynamics · Data Visualization and Analytics · Anomaly Detection Techniques and Applications
MethodsSoftmax · Attention Is All You Need · ALIGN
