TL;DR
This paper introduces FIDT maps and a local maxima detection strategy to improve crowd localization accuracy, especially in dense scenes, outperforming existing methods across multiple datasets.
Contribution
The paper proposes a novel FIDT map representation and a local maxima detection method, enhancing crowd localization accuracy in dense scenes beyond prior density map regression approaches.
Findings
Achieves state-of-the-art localization performance on six crowd datasets.
Demonstrates robustness in extremely dense and negative scenes.
FIDT maps effectively eliminate overlaps in dense regions.
Abstract
In this paper, we focus on the crowd localization task, a crucial topic of crowd analysis. Most regression-based methods utilize convolution neural networks (CNN) to regress a density map, which can not accurately locate the instance in the extremely dense scene, attributed to two crucial reasons: 1) the density map consists of a series of blurry Gaussian blobs, 2) severe overlaps exist in the dense region of the density map. To tackle this issue, we propose a novel Focal Inverse Distance Transform (FIDT) map for the crowd localization task. Compared with the density maps, the FIDT maps accurately describe the persons' locations without overlapping in dense regions. Based on the FIDT maps, a Local-Maxima-Detection-Strategy (LMDS) is derived to effectively extract the center point for each individual. Furthermore, we introduce an Independent SSIM (I-SSIM) loss to make the model tend to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
