Video Crowd Localization with Multi-focus Gaussian Neighborhood Attention and a Large-Scale Benchmark
Haopeng Li, Lingbo Liu, Kunlin Yang, Shinan Liu, Junyu Gao, Bin Zhao,, Rui Zhang, Jun Hou

TL;DR
This paper introduces GNANet, a novel neural network utilizing multi-focus Gaussian neighborhood attention for precise video crowd localization, supported by a large-scale benchmark dataset VSCrowd, achieving state-of-the-art results.
Contribution
The paper proposes a new attention mechanism and neural network architecture for improved video crowd localization, along with a comprehensive large-scale dataset for future research.
Findings
Achieves state-of-the-art performance on multiple datasets.
Effectively models spatial-temporal dependencies in crowded videos.
Demonstrates robustness to scale variations of human heads.
Abstract
Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos. To model spatial-temporal dependencies of human mobility, we propose a multi-focus Gaussian neighborhood attention (GNA), which can effectively exploit long-range correspondences while maintaining the spatial topological structure of the input videos. In particular, our GNA can also capture the scale variation of human heads well using the equipped multi-focus mechanism. Based on the multi-focus GNA, we develop a unified neural network called GNANet to accurately locate head centers in video clips by fully aggregating spatial-temporal information via a scene modeling module and a context cross-attention module. Moreover, to facilitate future researches in this field, we introduce a large-scale crowd video benchmark named VSCrowd, which consists…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition
