Multi-scale Feature Aggregation for Crowd Counting
Xiaoheng Jiang, Xinyi Wu, Hisham Cholakkal, Rao Muhammad Anwer, Jiale, Cao Mingliang Xu, Bing Zhou, Yanwei Pang, Fahad Shahbaz Khan

TL;DR
This paper introduces MSFANet, a multi-scale feature aggregation network that effectively addresses scale variation in crowd counting by combining feature fusion modules and local-global counting loss, achieving state-of-the-art results.
Contribution
The paper proposes a novel multi-scale feature aggregation network with two modules and a local-global counting loss to improve crowd counting accuracy under scale variations.
Findings
MSFANet outperforms previous methods on four challenging datasets.
The skip aggregation module effectively fuses features with different receptive fields.
The local-and-global counting loss improves handling of non-uniform crowd distributions.
Abstract
Convolutional Neural Network (CNN) based crowd counting methods have achieved promising results in the past few years. However, the scale variation problem is still a huge challenge for accurate count estimation. In this paper, we propose a multi-scale feature aggregation network (MSFANet) that can alleviate this problem to some extent. Specifically, our approach consists of two feature aggregation modules: the short aggregation (ShortAgg) and the skip aggregation (SkipAgg). The ShortAgg module aggregates the features of the adjacent convolution blocks. Its purpose is to make features with different receptive fields fused gradually from the bottom to the top of the network. The SkipAgg module directly propagates features with small receptive fields to features with much larger receptive fields. Its purpose is to promote the fusion of features with small and large receptive fields.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Mobility and Location-Based Analysis · Anomaly Detection Techniques and Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Label Smoothing · Softmax · Adam · Position-Wise Feed-Forward Layer · Layer Normalization · Byte Pair Encoding · Residual Connection
