Crowd Counting by Adaptively Fusing Predictions from an Image Pyramid
Di Kang, Antoni Chan

TL;DR
This paper introduces a novel crowd counting method that uses an image pyramid and adaptive prediction fusion with attention maps to handle scale variations and perspective distortions more effectively than previous CNN-based approaches.
Contribution
It proposes replacing filter size variations with an image pyramid and adaptive per-pixel scale fusion via attention maps for improved crowd counting accuracy.
Findings
Outperforms existing methods on three popular datasets
Effectively handles severe occlusion and large scale variations
Achieves superior density map estimation results
Abstract
Because of the powerful learning capability of deep neural networks, counting performance via density map estimation has improved significantly during the past several years. However, it is still very challenging due to severe occlusion, large scale variations, and perspective distortion. Scale variations (from image to image) coupled with perspective distortion (within one image) result in huge scale changes of the object size. Earlier methods based on convolutional neural networks (CNN) typically did not handle this scale variation explicitly, until Hydra-CNN and MCNN. MCNN uses three columns, each with different filter sizes, to extract features at different scales. In this paper, in contrast to using filters of different sizes, we utilize an image pyramid to deal with scale variations. It is more effective and efficient to resize the input fed into the network, as compared to using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Image Enhancement Techniques · Visual Attention and Saliency Detection
