An End-to-End Transformer Model for Crowd Localization
Dingkang Liang, Wei Xu, Xiang Bai

TL;DR
This paper introduces CLTR, an end-to-end transformer model for crowd localization that directly predicts head positions, outperforming previous methods that relied on complex post-processing and pseudo-bounding boxes.
Contribution
The paper presents a novel regression-based transformer approach for crowd localization, incorporating a KMO-based Hungarian matcher for improved matching accuracy.
Findings
Achieves state-of-the-art localization performance on multiple datasets.
Effectively reduces ambiguous points with the KMO-based matcher.
Demonstrates robustness across various data settings.
Abstract
Crowd localization, predicting head positions, is a more practical and high-level task than simply counting. Existing methods employ pseudo-bounding boxes or pre-designed localization maps, relying on complex post-processing to obtain the head positions. In this paper, we propose an elegant, end-to-end Crowd Localization Transformer named CLTR that solves the task in the regression-based paradigm. The proposed method views the crowd localization as a direct set prediction problem, taking extracted features and trainable embeddings as input of the transformer-decoder. To reduce the ambiguous points and generate more reasonable matching results, we introduce a KMO-based Hungarian matcher, which adopts the nearby context as the auxiliary matching cost. Extensive experiments conducted on five datasets in various data settings show the effectiveness of our method. In particular, the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Adam · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization
