TransCrowd: weakly-supervised crowd counting with transformers
Dingkang Liang, Xiwu Chen, Wei Xu, Yu Zhou, Xiang Bai

TL;DR
TransCrowd introduces a novel transformer-based weakly-supervised crowd counting method that leverages global context and self-attention, outperforming CNN-based approaches with only count-level annotations.
Contribution
It is the first to apply a pure transformer model to weakly-supervised crowd counting, addressing CNN limitations in context modeling.
Findings
Outperforms CNN-based weakly-supervised methods
Achieves competitive results with fully-supervised methods
Demonstrates effectiveness across five benchmark datasets
Abstract
The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count-level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus cannot achieve satisfactory performance, with limited applications in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Fire Detection and Safety Systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Layer Normalization · Label Smoothing · Residual Connection · Byte Pair Encoding
