CountFormer: Multi-View Crowd Counting Transformer
Hong Mo, Xiong Zhang, Jianchao Tan, Cheng Yang, Qiong Gu, Bo Hang,, Wenqi Ren

TL;DR
CountFormer is a novel 3D multi-view crowd counting transformer that effectively integrates camera parameters and multi-view features to produce accurate scene-level density maps, outperforming existing methods.
Contribution
The paper introduces CountFormer, a concise 3D MVC framework that embeds camera parameters and employs attention-based feature lifting and aggregation for flexible multi-view crowd counting.
Findings
Outperforms state-of-the-art methods on multiple datasets.
Handles arbitrary dynamic camera layouts effectively.
Demonstrates robustness in real-world scenarios.
Abstract
Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise 3D MVC framework called \textbf{CountFormer}to elevate multi-view image-level features to a scene-level volume representation and estimate the 3D density map based on the volume features. By incorporating a camera encoding strategy, CountFormer successfully embeds camera parameters into the volume query and image-level features, enabling it to handle various camera layouts with significant differences.Furthermore, we introduce a feature lifting module capitalized on the attention mechanism to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Data Stream Mining Techniques
MethodsSoftmax · Attention Is All You Need
