Counting Varying Density Crowds Through Density Guided Adaptive   Selection CNN and Transformer Estimation

Yuehai Chen; Jing Yang; Badong Chen; Shaoyi Du

arXiv:2206.10075·cs.CV·October 17, 2022

Counting Varying Density Crowds Through Density Guided Adaptive Selection CNN and Transformer Estimation

Yuehai Chen, Jing Yang, Badong Chen, Shaoyi Du

PDF

TL;DR

This paper introduces CTASNet, a novel crowd counting model that adaptively combines CNN and Transformer predictions based on density regions, effectively handling varying crowd densities with improved accuracy.

Contribution

The paper proposes a density guided adaptive selection network that dynamically chooses between CNN and Transformer for crowd counting, addressing density variation challenges.

Findings

01

Outperforms existing methods on four challenging datasets.

02

Effectively handles both low-density and high-density crowd regions.

03

Reduces annotation noise impact with a novel loss function.

Abstract

In real-world crowd counting applications, the crowd densities in an image vary greatly. When facing density variation, humans tend to locate and count the targets in low-density regions, and reason the number in high-density regions. We observe that CNN focus on the local information correlation using a fixed-size convolution kernel and the Transformer could effectively extract the semantic crowd information by using the global self-attention mechanism. Thus, CNN could locate and estimate crowds accurately in low-density regions, while it is hard to properly perceive the densities in high-density regions. On the contrary, Transformer has a high reliability in high-density regions, but fails to locate the targets in sparse regions. Neither CNN nor Transformer can well deal with this kind of density variation. To address this problem, we propose a CNN and Transformer Adaptive Selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Softmax · Dropout · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Byte Pair Encoding · Label Smoothing