MCNet: A crowd denstity estimation network based on integrating multiscale attention module
Qiang Guo, Rubo Zhang, Di Zhao

TL;DR
This paper introduces MCNet, a novel crowd density estimation network that combines a multiscale attention module with a lightweight feature extractor, optimized for real-time metro surveillance with limited hardware.
Contribution
The paper proposes the IMA module for enhanced semantic feature extraction and a lightweight network for efficient crowd density estimation, suitable for embedded systems.
Findings
Effective crowd texture feature extraction demonstrated on multiple datasets.
Improved accuracy in high-density, occluded, and perspective-distorted scenes.
Faster processing speed with fewer parameters suitable for embedded deployment.
Abstract
Aiming at the metro video surveillance system has not been able to effectively solve the metro crowd density estimation problem, a Metro Crowd density estimation Network (called MCNet) is proposed to automatically classify crowd density level of passengers. Firstly, an Integrating Multi-scale Attention (IMA) module is proposed to enhance the ability of the plain classifiers to extract semantic crowd texture features to accommodate to the characteristics of the crowd texture feature. The innovation of the IMA module is to fuse the dilation convolution, multiscale feature extraction and attention mechanism to obtain multi-scale crowd feature activation from a larger receptive field with lower computational cost, and to strengthen the crowds activation state of convolutional features in top layers. Secondly, a novel lightweight crowd texture feature extraction network is proposed, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods · Time Series Analysis and Forecasting
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
