Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Hui Lin; Zhiheng Ma; Xiaopeng Hong; Qinnan Shangguan; Deyu; Meng

arXiv:2401.03870·cs.CV·January 9, 2024·2 cites

Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Hui Lin, Zhiheng Ma, Xiaopeng Hong, Qinnan Shangguan, Deyu, Meng

PDF

Open Access 1 Repo 1 Video

TL;DR

Gramformer introduces a graph-modulated transformer for crowd counting, enhancing attention diversity and node importance encoding to improve accuracy on challenging datasets.

Contribution

The paper proposes a novel graph-modulated transformer that adjusts attention and node features based on dissimilarity and centrality graphs, addressing homogenization issues.

Findings

01

Outperforms existing methods on four crowd counting datasets.

02

Effectively diversifies attention maps to capture complementary information.

03

Demonstrates robustness and competitiveness in crowd counting tasks.

Abstract

Transformer has been popular in recent crowd counting work since it breaks the limited receptive field of traditional CNNs. However, since crowd images always contain a large number of similar patches, the self-attention mechanism in Transformer tends to find a homogenized solution where the attention maps of almost all patches are identical. In this paper, we address this problem by proposing Gramformer: a graph-modulated transformer to enhance the network by adjusting the attention and input node features respectively on the basis of two different types of graphs. Firstly, an attention graph is proposed to diverse attention maps to attend to complementary information. The graph is building upon the dissimilarities between patches, modulating the attention in an anti-similarity fashion. Secondly, a feature-based centrality encoding is proposed to discover the centrality positions or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LoraLinH/Gramformer
pytorchOfficial

Videos

Gramformer: Learning Crowd Counting via Graph-Modulated Transformer· underline

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Human Mobility and Location-Based Analysis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Adam · Layer Normalization · Residual Connection · Absolute Position Encodings · Dense Connections · Position-Wise Feed-Forward Layer