CCTrans: Simplifying and Improving Crowd Counting with Transformer

Ye Tian; Xiangxiang Chu; Hongpeng Wang

arXiv:2109.14483·cs.CV·September 30, 2021·64 cites

CCTrans: Simplifying and Improving Crowd Counting with Transformer

Ye Tian, Xiangxiang Chu, Hongpeng Wang

PDF

Open Access 2 Repos

TL;DR

CCTrans leverages a transformer-based architecture to effectively model global context in crowd counting, achieving state-of-the-art results and simplifying the traditional CNN-based pipeline.

Contribution

Introduces CCTrans, a transformer-based crowd counting model with a pyramid vision transformer backbone and feature aggregation, surpassing previous methods in accuracy.

Findings

01

Achieves new state-of-the-art results on multiple benchmarks.

02

Ranks No.1 on NWPU-Crowd leaderboard.

03

Effective in both weakly and fully-supervised settings.

Abstract

Most recent methods used for crowd counting are based on the convolutional neural network (CNN), which has a strong ability to extract local features. But CNN inherently fails in modeling the global context due to the limited receptive fields. However, the transformer can model the global context easily. In this paper, we propose a simple approach called CCTrans to simplify the design pipeline. Specifically, we utilize a pyramid vision transformer backbone to capture the global crowd information, a pyramid feature aggregation (PFA) model to combine low-level and high-level features, an efficient regression head with multi-scale dilated convolution (MDC) to predict density maps. Besides, we tailor the loss functions for our pipeline. Without bells and whistles, extensive experiments demonstrate that our method achieves new state-of-the-art results on several benchmarks both in weakly and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Fire Detection and Safety Systems

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Dense Connections · Dilated Convolution · Residual Connection · Vision Transformer · Convolution