TL;DR
This paper introduces a CNN-based cascaded multi-task learning framework that jointly classifies crowd counts and estimates density maps, effectively handling scale variations in dense crowd scenes.
Contribution
It presents a novel end-to-end cascaded CNN architecture that incorporates high-level crowd count classification to improve density map accuracy in crowd counting.
Findings
Achieves lower count error than state-of-the-art methods
Produces higher quality density maps
Performs well on challenging datasets
Abstract
Estimating crowd count in densely crowded scenes is an extremely challenging task due to non-uniform scale variations. In this paper, we propose a novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation. Classifying crowd count into various groups is tantamount to coarsely estimating the total count in the image thereby incorporating a high-level prior into the density estimation network. This enables the layers in the network to learn globally relevant discriminative features which aid in estimating highly refined density maps with lower count error. The joint training is performed in an end-to-end fashion. Extensive experiments on highly challenging publicly available datasets show that the proposed method achieves lower count error and better quality density maps as compared to the recent state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
