Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network
Longyin Wen, Dawei Du, Pengfei Zhu, Qinghua Hu, Qilong Wang, Liefeng, Bo, Siwei Lyu

TL;DR
This paper introduces STANet, a multi-scale attention network for drone-based density estimation, localization, and tracking in dense crowds, along with a new large-scale drone crowd dataset, DroneCrowd.
Contribution
The paper presents a novel space-time multi-scale attention network and a large drone crowd dataset, enabling improved density map estimation, localization, and tracking from drone videos.
Findings
STANet outperforms existing methods on public datasets.
DroneCrowd dataset contains 33,600 high-res frames and 20,800 trajectories.
End-to-end training with multi-task loss enhances performance.
Abstract
This paper proposes a space-time multi-scale attention network (STANet) to solve density map estimation, localization and tracking in dense crowds of video clips captured by drones with arbitrary crowd density, perspective, and flight altitude. Our STANet method aggregates multi-scale feature maps in sequential frames to exploit the temporal coherency, and then predict the density maps, localize the targets, and associate them in crowds simultaneously. A coarse-to-fine process is designed to gradually apply the attention module on the aggregated multi-scale feature maps to enforce the network to exploit the discriminative space-time features for better performance. The whole network is trained in an end-to-end manner with the multi-task loss, formed by three terms, i.e., the density map loss, localization loss and association loss. The non-maximal suppression followed by the min-cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
