GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal   Transformer

Shuaicheng Li; Qianggang Cao; Lingbo Liu; Kunlin Yang; Shinan Liu; Jun; Hou; Shuai Yi

arXiv:2108.12630·cs.CV·August 31, 2021·5 cites

GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer

Shuaicheng Li, Qianggang Cao, Lingbo Liu, Kunlin Yang, Shinan Liu, Jun, Hou, Shuai Yi

PDF

Open Access 1 Repo

TL;DR

GroupFormer is a novel transformer-based model that jointly captures spatial-temporal interactions for group activity recognition, outperforming previous methods on key datasets.

Contribution

It introduces a clustered spatial-temporal transformer that models dependencies integrally and dynamically clusters individuals for improved semantic representations.

Findings

01

Outperforms state-of-the-art on Volleyball dataset

02

Outperforms state-of-the-art on Collective Activity dataset

03

Effectively models spatial-temporal dependencies

Abstract

Group activity recognition is a crucial yet challenging problem, whose core lies in fully exploring spatial-temporal interactions among individuals and generating reasonable group representations. However, previous methods either model spatial and temporal information separately, or directly aggregate individual features to form group features. To address these issues, we propose a novel group activity recognition network termed GroupFormer. It captures spatial-temporal contextual information jointly to augment the individual and group representations effectively with a clustered spatial-temporal transformer. Specifically, our GroupFormer has three appealing advantages: (1) A tailor-modified Transformer, Clustered Spatial-Temporal Transformer, is proposed to enhance the individual representation and group representation. (2) It models the spatial and temporal dependencies integrally and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xueyee/groupformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout · Layer Normalization · Dense Connections · Byte Pair Encoding · Softmax