TL;DR
This paper introduces PFDNet, a novel crowd counting method that models continuous scale variations using perspective-guided fractional dilation, improving accuracy and efficiency over existing multi-scale approaches.
Contribution
The paper proposes a perspective-guided fractional dilation convolution that models continuous pedestrian scales and introduces a perspective estimation branch for cases lacking perspective data.
Findings
Outperforms state-of-the-art on multiple datasets
Achieves lower MAE scores compared to previous methods
More computationally efficient without multi-scale architecture
Abstract
Crowd counting is critical for numerous video surveillance scenarios. One of the main issues in this task is how to handle the dramatic scale variations of pedestrians caused by the perspective effect. To address this issue, this paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dilation Network (PFDNet). By modeling the continuous scale variations, the proposed PFDNet is able to select the proper fractional dilation kernels for adapting to different spatial locations. It significantly improves the flexibility of the state-of-the-arts that only consider the discrete representative scales. In addition, by avoiding the multi-scale or multi-column architecture that used in other methods, it is computationally more efficient. In practice, the proposed PFDNet is constructed by stacking multiple Perspective-guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
