# SPCANet: congested crowd counting via strip pooling combined attention network

**Authors:** Zhongyuan Yuan

PMC · DOI: 10.7717/peerj-cs.2273 · PeerJ Computer Science · 2024-09-18

## TL;DR

This paper introduces SPCANet, a new model for counting people in crowded scenes that improves accuracy by handling perspective distortion and occlusion better than previous methods.

## Contribution

The novel contribution is the strip pooling combined attention network (SPCANet) that enhances spatial information capture in dense crowds.

## Key findings

- SPCANet outperforms baselines with lower mean absolute error on four crowd counting datasets.
- The model reduces mean squared error by 5.7% on average across datasets.
- SPCANet improves robustness in handling perspective distortion and dense occlusion.

## Abstract

Crowd counting aims to estimate the number and distribution of the population in crowded places, which is an important research direction in object counting. It is widely used in public place management, crowd behavior analysis, and other scenarios, showing its robust practicality. In recent years, crowd-counting technology has been developing rapidly. However, in highly crowded and noisy scenes, the counting effect of most models is still seriously affected by the distortion of view angle, dense occlusion, and inconsistent crowd distribution. Perspective distortion causes crowds to appear in different sizes and shapes in the image, and dense occlusion and inconsistent crowd distributions result in parts of the crowd not being captured completely. This ultimately results in the imperfect capture of spatial information in the model. To solve such problems, we propose a strip pooling combined attention (SPCANet) network model based on normed-deformable convolution (NDConv). We model long-distance dependencies more efficiently by introducing strip pooling. In contrast to traditional square kernel pooling, strip pooling uses long and narrow kernels (1×N or N×1) to deal with dense crowds, mutual occlusion, and overlap. Efficient channel attention (ECA), a mechanism for learning channel attention using a local cross-channel interaction strategy, is also introduced in SPCANet. This module generates channel attention through a fast 1D convolution to reduce model complexity while improving performance as much as possible. Four mainstream datasets, Shanghai Tech Part A, Shanghai Tech Part B, UCF-QNRF, and UCF CC 50, were utilized in extensive experiments, and mean absolute error (MAE) exceeds the baseline, which is 60.9, 7.3, 90.8, and 161.1, validating the effectiveness of SPCANet. Meanwhile, mean squared error (MSE) decreases by 5.7% on average over the four datasets, and the robustness is greatly improved.

## Full-text entities

- **Diseases:** occlusion (MESH:D001157), aggression (MESH:D010554), crowd stampede accidents (MESH:D008310)
- **Chemicals:** ECA (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11419659/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11419659/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC11419659/full.md

---
Source: https://tomesphere.com/paper/PMC11419659