Learning Spatial Awareness to Improve Crowd Counting
Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann

TL;DR
This paper introduces SPANet, a novel deep learning architecture that incorporates spatial awareness into crowd counting by using a new pixel-level loss, significantly improving accuracy and outperforming existing methods.
Contribution
The paper proposes the MEP loss and SPANet architecture to effectively incorporate spatial context into crowd counting models, addressing limitations of traditional Euclidean loss.
Findings
Outperforms state-of-the-art methods on all benchmark datasets.
Significantly improves crowd counting accuracy over baseline models.
Demonstrates robustness to noise and variations in crowd images.
Abstract
The aim of crowd counting is to estimate the number of people in images by leveraging the annotation of center positions for pedestrians' heads. Promising progresses have been made with the prevalence of deep Convolutional Neural Networks. Existing methods widely employ the Euclidean distance (i.e., loss) to optimize the model, which, however, has two main drawbacks: (1) the loss has difficulty in learning the spatial awareness (i.e., the position of head) since it struggles to retain the high-frequency variation in the density map, and (2) the loss is highly sensitive to various noises in crowd counting, such as the zero-mean noise, head size changes, and occlusions. Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition
